SlideShare una empresa de Scribd logo
1 de 55
How to apply yourHow to apply your
taxonomy to yourtaxonomy to your
contentcontent
AUTOMATICALLYAUTOMATICALLY
Marjorie M.K. Hlava
President
Access Innovations / Data Harmony
mhlava@accessinn.com
www.dataharmony.com
www.accessinn.com
505-998-0800
Three things to cover
today
1. Operationalizing Auto Indexing (MAI)
2. Where is MAI using a taxonomy used now
3. How does the MAI work?
Operationalizing the
taxonomy
• Expected organizational change
• How is it put into place - Getting it applied Getting
people to understand
• Getting the organization to embrace the taxonomy
• Two areas to consider
– Technical aspects of implementation
– Cultural Aspects of changes use patterns
Expected organizational
change
• Find and discover ability improved!
• Automatic implementation
• Tagging everything
• Document management plan
• Make it part of the infrastructure
• Thinking in outlines
• Everyone uses the same outline - taxonomy
Getting the organization to
embrace the taxonomy 
• Use the MAI as an aid
–Indexing = applying the taxonomy terms
–Automatic or background indexing
–Index on Save or submit
–Automatically index the old documents
whenever they are seen or requested
again
• Make it easy for the users – auto in the
background
Assumptions
• Taxonomy has been created
• Tested on the documents, pages etc.
• Agreed to by the major stakeholders
• Is ready to apply to the records
– There is a field to put this subject metadata in
– The use is established –
• Where the terms will be used
• How the terms will be used
• Maintenance is established
– Search log feeds
– Feedback from users
– Wired into new product development needs
Where is it put into place?
• Tag / index content
– Metadata records
– Full text inline
• Web navigation
– Browse – navigation tree
– NavBar
– Search
• Product catalogs
• Customer Service – find the answers quickly
Inline Tagging
Show the exact point where the
concept is mentioned
Mouse-over to view the term record
Statistical summary, showing the
number of times each term is
mentioned in the article
Show the exact point where the
concept is mentioned
Mouse-over to view the term record
Statistical summary, showing the
number of times each term is
mentioned in the article
Copyright © 2013 Access Innovations, Inc.
Taxonomy DrivenTaxonomy Driven
Search PresentationSearch Presentation
Copyright © 2013 Access Innovations, Inc.
HTML Headers
META NAME KEYWORD
Use the taxonomy here
Taxonomies in new ways
• Mashups
• Visualization of your content
• Trend analysis
• Content segmentation / repackaging
• Author / people profiles
• Place and people disambiguation
• Data cross walks
• In Sharepoint
Authors at a placeMASHUP locations to a
GPS grid of an area
Two data points
GPS Coordinates
Taxonomy description of the
Copyright © 2013 Access Innovations, Inc.
15
Visualization Strategies
Matrix
Visualization
Software
Copyright © 2013 Access Innovations, Inc.
All Data Up-posted
To The Top Level
Copyright © 2013 Access Innovations, Inc.
Copyright © 2013 Access Innovations, Inc.
Pattern Analysis Domain
Associations
Copyright © 2013 Access Innovations, Inc.
More Like This - Recommender
Cancer Epidemiology Biomarkers & Prevention
Vol. 12, 161-164,
February 2003
© 2003 American Association for Cancer Research
Short Communications
Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the
American Cancer Society Cancer Prevention Study II Nutrition Cohort
Heather Spencer Feigelson1
, Carolyn R. Jonas, Andreas S. Robertson,
Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department
of Epidemiology and Surveillance Research, American Cancer Society, National
Home Office, Atlanta, Georgia 30329-4251
Recent studies suggest that the increased risk of breast cancer associated with
alcohol consumption may be reduced by adequate folate intake. We examined
this question among 66,561 postmenopausal women in the American Cancer
Society Cancer Prevention Study II Nutrition Cohort.
Related Press Releases
•How What and How Much We Eat (And Drink) Affects Our Risk
•Novel COX-2 Combination Treatment May Reduce Colon Cance
Combination Regimen of COX-2 Inhibitor and Fish Oil Causes C
•COX-2 Levels Are Elevated in Smokers
Related AACR Workshops and Conferences
•Frontiers in Cancer Prevention Research
•Continuing Medical Education (CME)
•Molecular Targets and Cancer Therapeutics
Related Meeting Abstracts
•Association between dietary folate intake, alcohol intake, and m
•Folate, folate cofactor, and alcohol intakes and risk for colorect
•Dietary folate intake and risk of prostate cancer in a large prosp
Related Working Groups
•Finance
•Charter
•Molecular Epidemiology
Related Education Book Content
Oral Contraceptives, Postmenopausal Hormones, and
Physical Activity and Cancer
Hormonal Interventions: From Adjuvant Therapy to
Breast Cancer Prevention
Related Awards
•AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards
•ACS Award
•Weinstein Distinguished Lecture
Webcasts
Related Webcasts
Think Tank Report
Related Think Tank Report
Content
Copyright © 2013 Access Innovations, Inc.
Link to Society Resources
Journal
Article on
Topic A
Other
Journal
Articles on
Topic A
Upcoming
Conference
on Topic A
Podcast Interview
with Researcher
Working on Topic A
Grant Available
for Researchers
Working on
Topic A
CME
Activity on
Topic A
Job Posting
for Expert
on Topic A
Copyright © 2013 Access Innovations, Inc.
Thesaurus MasterThesaurus Master
Machine Aided
Indexer
(M.A.I.™)
Reposito
ry
Term
Store
Reposito
ry
Term
Store
Search
Presentation:
90% accuracy
Browse by Subject
Auto-completion
Broader Terms
Narrower Terms
Related Terms
Search
Presentation:
90% accuracy
Browse by Subject
Auto-completion
Broader Terms
Narrower Terms
Related Terms
Inline Tagging
Metadata and
Entity Extractor
Automatic
Summarization
Search
Software
Search
Software
Client
taxonomy
Client
taxonomy
Taxonomy In Sharepoint
Copyright © 2013 Access Innovations, Inc.
Editorial Workflow Integration
Author Submission Module
The author fills in the data to the document template, attaching images and
graphs as necessary.
An API calls Data Harmony and generates a list of indexing terms based on
the content.
Copyright © 2013 Access Innovations, Inc.
A Quick Look
Behind the Scenes
Database
Management
System
Thesaurus
tool
Indexing
Tool
MAI
•Validate terms
•Add terms and rules
•Change terms and rules
•Delete terms and rules
•Search thesaurus
•Validate term entry
•Block invalid terms
•Record candidates
•Establish rules for term
use
•Auto Suggest indexing
terms
Copyright © 2013 Access Innovations, Inc.
Database
Management
System Add
Metadata using
MAI
Search
Inverted File
Aadvark
Alligator
Apple
Advantage
….
Zebra
Record locator
Accessinn.com/12345/demofile/recid15
Database
records
Each with
many
elements
Portal Searching
Data Feed Portal page
Thesaurus /
taxonomy
system
add Metadata
Data Base
Management
System
Automatic
Metadata
Suggestions
Search
Data Base
Management
System
Add Metadata
using MAI
Search
Use taxonomy to search
• Consistency
– Machine Aided Indexing consistently suggests the same term
under the same conditions.
– Initial tests on units from client proved to be 20% more
consistent than manual indexing.
• Deeper Indexing
• Machine Aided Indexing may suggest more terms per unit than
manual indexers working under a per unit quota.
• Faster Processing
• Human verification of suggested indexing terms can be seven (7)
times as fast as manual indexing.
• Fully automatic indexing can run 8 million articles in 2 hours in
the Amazon Cloud Service
• Full text indexing article(30 page articles) at less than a minute
per article07/25/13 Copyright (c) 1998 Access Innovations, Inc.
Advantages of MAI
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
Use of Thesaurus
• Faster Rule Building
• Determines Depth of Indexing
• Increases Measurable Consistency
• Provides Base for Statistical
Computations
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
MAI System Overview --
System Components
1. Thesaurus / Taxonomy
2. MAI Program
a. Rule Builder
b. MAI Engine (matches text to terms)
c. Statistics Computation
3. Knowledge Base / Rule base
4. Editorial Evaluation
How does auto indexing work?
• The Modules
• The syntax
• The process
• How do you know if it’s good? Statistics /
evaluation
• Rules of thumb
07/25/13
Copyright (c) 1998 Access Innovations, Inc.
Knowledge Base Creative
Loop
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
Edit • Interactive text editor for building rules
• Provides insertion, deletion, copying of
characters or marked blocks
Reports • Creates Knowledge Base listings
• Can output all rules, or selected rules
MAI - Rule Builder
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
MAI – Concept Extractor
Input • Reads client’s electronic file format
• Creates internal format for scanning
• Modified project by project
Interpreter • Scans reformatted input, selects phrases
• Looks up text phrases in Knowledge Base
• Interprets rule(s) associated with matched phrase
Output • Produces electronic file of scanned units with
suggested index terms for production of print file
• Optionally produces data file for statistics
program
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
MAI - Statistics Computation
How do you know if indexing is accurate?
• Accepts optional data file created by MAI engine
while processing units with existing index terms
• Computes MAI ‘Hit’, ‘Miss’, ‘Noise’ figures Reports
• Produces listing of terms with unit numbers
categorized as ‘Hit’, ‘Miss’, or ‘Noise’
Rules Type
• Simple Condition (80%)
• Identity
• synonym
• Complex Rules:
• Compound Condition
• Multiple Condition
Explanation
• If (condition is true) use term ‘A’
• Use term ‘B’ where ‘A’ (matched
text) is synonymous with, but not
identical to, ‘B’
• If (condition 1 is true and/or
condition 2 is true…and/or
condition n is true) use term ‘A’
• If (condition 1 is true…) use term ‘A’
else if (condition 2 is true…) use
term ‘B’ else if (condition n is true…)
use term ‘C’
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
Rule Types
• Condition Type
• Proximity
• Location
• Format
• Truncation
• Explanation
• Helps to determine context of a
phrase by its relative location to
another indicative phrase.
• Determines where in the unit a
phrase occurs
• Checks the matched text for certain
conditions such as capitalization,
format (italic codes) or affixes
(prefix or suffix to root word).
• Left or right word stem isolation07/25/13 Copyright (c) 1998 Access Innovations, Inc.
More options
• NEAR “string”
– If the matched text is within 3 words before or after
another “string” in the same sentence.
• WITH “string”
– If the matched text is in the same sentence as another
“string”
• MENTIONS “string”
– If the matched text is in the same abstract as another
“string”
• AROUND “string”
– If the matched text is in the same text as another “string”
07/25/13
Proximity Conditions
• in title
– If the matched text is in “field”
• in abstract
– If the matched text is in “field”
• begin sentence
– If the matched text is located at the beginning of the
sentence
• end sentence
– If the matched text is located at the end of the sentence
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
Location Conditions
• all caps
– If the matched text is all caps
• initial caps
– If the matched text begins with a capital letter
• match “string 1”,
– If the matched text begins with, ends “string 2” with, or is
included within specific character strings(s), such as
“<i>”, “</i>” or quotation sets
07/25/13
Format Conditions
• Affix “string1”,
– • If the matched text is a root word in“string2”
the Knowledge Base which is prefixed (string1) or
suffixed (string2) by specific character string(s),
such as “pseudo” or “ization”.
• Either string1 or string2 may be null(“”).
• Additionally, it will allow wild card characters such
as “*”.
07/25/13
Format Conditions (
07/25/13
Simple Rules: Identity
text: artifact
rule: use artifact
text: accelerator mass spectrometry
rule: use accelerator mass spectrometry
text: National Socialism
rule: use National Socialism
use fascism
07/25/13
Simple Rules: Synonym
text: testing
rule: use analysis
text: the history of
rule: use history
text: micrographs
rule: use scanning electron microscopy
text: casoni
rule: use adobe
07/25/13
Complex Rules: Simple Condition
Automatically created
text: cave
rule: if (near “Sandia”)
use Sandia cave site
text: packing
rule: if (near “shipping”)
use packing and shipping
text: chin
rule: if (all caps)use
Canadian Heritage Information Network
07/25/13
Complex Rules: Compound Condition
text: congress
rule: if (initial caps and not begin sentence)
and (with “u.s.” or with “united states”))
use U.S. Congress
text: dye
rule: if (in title and (near “natural” or
with “vegetable”)) use natural dyeing
text: x-ray
rule: if (near “energy dispersive” or mentions
“(xrd)” or (near “fluorescence” and with
“spectroscopy”))
use energy-dispersive x-ray fluorescence
spectroscopy
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
Complex MAI Rule
Rule: survey
12804: IF (near "geological" OR with "geophysics" OR with "oil recovery")
USE geophysical prospecting
ENDIF
IF (mentions "interstellar" OR mentions "star" OR mentions "stars"
OR mentions "Universe" OR mentions "stellar"
OR mentions "galaxies" OR mentions "protostars" OR mentions "protostar"
OR mentions "galactic" OR mentions "galaxy" OR mentions "radio source"
OR mentions "HII regions" OR mentions "nebula" OR mentions "nova"
OR mentions "supernova" OR mentions "nebulae" OR mentions "novae")
USE astronomy and astrophysics
ENDIF
IF (mentions "discusses" OR mentions "presents" OR mentions
"the author" OR mentions "this article" OR mentions "this book"
OR mentions "this paper" OR mentions "the article" OR mentions
"the book" OR mentions "the paper")
USE reviews
ENDIF
• HIT
– When the MAI engine generates an indexing term
which is identical to an index term which would
have been assigned by a human indexer
• MISS
– When the MAI engine fails to generate an
indexing term which would have been assigned by
a human indexer.
• NOISE
– When the MAI engine generates an indexing term
which is incorrect, out of context, or illogical.
07/25/13
MAI Statistics Terms
• CONSISTENCY
– When topical or descriptive indexing terms are assigned in
the same way from article to article over time.
• DEPTH
– A qualitative measure of the level and range of detail of
the indexing (ie, from broader terms to narrower terms of
a single concept).
• BREADTH
– A qualitative measure of the number or range of concepts
indexed in an article.
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
MAI Statistics Terms
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
• Aim for 85% of higher accuracy
– Hits of misses and noise
• Maintain monthly
– Thesaurus / taxonomy updates
– From search logs
– From new sources covered
– From new initiatives and term usage
• Maintain MAI with term changes and additions
– 80% of rules are automatic match rules
– Should create 6 – 10 rules per hour
• Do regular random sampling to improve the indexing quality 1%
• Reindex when taxonomy changes reach 5% of the term base
• Hire people good at logic and word games
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
Rules of thumb
Cost items
• Software
– Two main types –
• Bayesian / statistical / co-occurrence
• Rules based
• Educating the system
– Training the system based on collection of records where the terms
are used correctly
– 20% of terms may need disambiguation
• Implementing the system
– Setting the vectors (programmers)
– Applying the rules (editors / taxonomists)
• Ongoing maintenance
– Resetting the vectors (programmers)
– Updating the terms and rules (editors / taxonomists)
• A. MAI System Components
• Thesaurus
• Knowledge Base
• MAI Program
• Editorial Evaluation
• B. Use of Thesaurus
• Word Standardization
• Faster Rule Building
• Increased Measurable Consistency
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
MAI Summary
• C. Knowledge Base Components
• Simple Rules
• Complex Rules
• D. MAI Program Modules
• Rule Builder Module
• MAI Engine Module
• Statistics Computation
• E. Advantages of Rules based MAI
• Consistency
• Deeper Indexing
• Faster Thru Put
07/25/13 Copyright (c) 1998 Access Innovations, Inc.
MAI Summary
Taxonomy Is core to the system
Thank youThank you
Marjorie M.K. Hlava
President
Access Innovations / Data Harmony
mhlava@accessinn.com
www.dataharmony.com
www.accessinn.com
505-998-0800

Más contenido relacionado

Similar a How to Apply Your Taxonomy to Your Content Automatically

Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas WorkshopNiall Beard
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarConcept Searching, Inc
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Concept Searching, Inc
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...martingarland
 
Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!Concept Searching, Inc
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
How to Get the Most Out of Search Webinar
How to Get the Most Out of Search WebinarHow to Get the Most Out of Search Webinar
How to Get the Most Out of Search WebinarConcept Searching, Inc
 
How to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarHow to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarConcept Searching, Inc
 
Model Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinModel Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinEmbarcadero Technologies
 
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15Alicia Harapko
 
Intelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointIntelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointConcept Searching, Inc
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...martingarland
 
Governance policies records
Governance policies recordsGovernance policies records
Governance policies recordsKashish Sukhija
 
SharePoint 2013 governance model
SharePoint 2013 governance modelSharePoint 2013 governance model
SharePoint 2013 governance modelYash Goley
 

Similar a How to Apply Your Taxonomy to Your Content Automatically (20)

Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
 
Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations Webinar
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
 
What’s New in Semantic Enrichment?
What’s New in Semantic Enrichment?What’s New in Semantic Enrichment?
What’s New in Semantic Enrichment?
 
Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!Taxonomy and tagging – manual tagging does not work!
Taxonomy and tagging – manual tagging does not work!
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
How to Get the Most Out of Search Webinar
How to Get the Most Out of Search WebinarHow to Get the Most Out of Search Webinar
How to Get the Most Out of Search Webinar
 
How to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right WebinarHow to Get Enterprise Search Right Webinar
How to Get Enterprise Search Right Webinar
 
Model Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinModel Confidence for Master Data with David Loshin
Model Confidence for Master Data with David Loshin
 
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
 
Intelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointIntelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePoint
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
 
SharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationSharePoint Fest Chicago Presentation
SharePoint Fest Chicago Presentation
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
Governance policies records
Governance policies recordsGovernance policies records
Governance policies records
 
SharePoint 2013 governance model
SharePoint 2013 governance modelSharePoint 2013 governance model
SharePoint 2013 governance model
 

Más de Access Innovations, Inc.

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Access Innovations, Inc.
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Access Innovations, Inc.
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Access Innovations, Inc.
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedAccess Innovations, Inc.
 

Más de Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 
DHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction TrainingDHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction Training
 
DHUG 2017 - Access Integrity
DHUG 2017 - Access IntegrityDHUG 2017 - Access Integrity
DHUG 2017 - Access Integrity
 

Último

SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptMumux Mirani
 
Apiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptApiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptkedirjemalharun
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdfDolisha Warbi
 
Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?bkling
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptxTina Purnat
 
Measurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxMeasurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxDr. Dheeraj Kumar
 
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxPERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxdrashraf369
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdfDolisha Warbi
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptxBibekananda shah
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurNavdeep Kaur
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAAjennyeacort
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfDolisha Warbi
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Prerana Jadhav
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiGoogle
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt downloadAnkitKumar311566
 

Último (20)

SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.ppt
 
Apiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptApiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.ppt
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes Functions
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
 
Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptx
 
Epilepsy
EpilepsyEpilepsy
Epilepsy
 
Measurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxMeasurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptx
 
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxPERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
PULMONARY EDEMA AND  ITS  MANAGEMENT.pdfPULMONARY EDEMA AND  ITS  MANAGEMENT.pdf
PULMONARY EDEMA AND ITS MANAGEMENT.pdf
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali Rai
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt download
 

How to Apply Your Taxonomy to Your Content Automatically

  • 1. How to apply yourHow to apply your taxonomy to yourtaxonomy to your contentcontent AUTOMATICALLYAUTOMATICALLY Marjorie M.K. Hlava President Access Innovations / Data Harmony mhlava@accessinn.com www.dataharmony.com www.accessinn.com 505-998-0800
  • 2. Three things to cover today 1. Operationalizing Auto Indexing (MAI) 2. Where is MAI using a taxonomy used now 3. How does the MAI work?
  • 3. Operationalizing the taxonomy • Expected organizational change • How is it put into place - Getting it applied Getting people to understand • Getting the organization to embrace the taxonomy • Two areas to consider – Technical aspects of implementation – Cultural Aspects of changes use patterns
  • 4. Expected organizational change • Find and discover ability improved! • Automatic implementation • Tagging everything • Document management plan • Make it part of the infrastructure • Thinking in outlines • Everyone uses the same outline - taxonomy
  • 5. Getting the organization to embrace the taxonomy  • Use the MAI as an aid –Indexing = applying the taxonomy terms –Automatic or background indexing –Index on Save or submit –Automatically index the old documents whenever they are seen or requested again • Make it easy for the users – auto in the background
  • 6. Assumptions • Taxonomy has been created • Tested on the documents, pages etc. • Agreed to by the major stakeholders • Is ready to apply to the records – There is a field to put this subject metadata in – The use is established – • Where the terms will be used • How the terms will be used • Maintenance is established – Search log feeds – Feedback from users – Wired into new product development needs
  • 7. Where is it put into place? • Tag / index content – Metadata records – Full text inline • Web navigation – Browse – navigation tree – NavBar – Search • Product catalogs • Customer Service – find the answers quickly
  • 8. Inline Tagging Show the exact point where the concept is mentioned Mouse-over to view the term record Statistical summary, showing the number of times each term is mentioned in the article Show the exact point where the concept is mentioned Mouse-over to view the term record Statistical summary, showing the number of times each term is mentioned in the article Copyright © 2013 Access Innovations, Inc.
  • 9. Taxonomy DrivenTaxonomy Driven Search PresentationSearch Presentation Copyright © 2013 Access Innovations, Inc.
  • 10.
  • 11. HTML Headers META NAME KEYWORD Use the taxonomy here
  • 12.
  • 13. Taxonomies in new ways • Mashups • Visualization of your content • Trend analysis • Content segmentation / repackaging • Author / people profiles • Place and people disambiguation • Data cross walks • In Sharepoint
  • 14. Authors at a placeMASHUP locations to a GPS grid of an area Two data points GPS Coordinates Taxonomy description of the Copyright © 2013 Access Innovations, Inc.
  • 16. All Data Up-posted To The Top Level Copyright © 2013 Access Innovations, Inc.
  • 17. Copyright © 2013 Access Innovations, Inc.
  • 18. Pattern Analysis Domain Associations Copyright © 2013 Access Innovations, Inc.
  • 19. More Like This - Recommender Cancer Epidemiology Biomarkers & Prevention Vol. 12, 161-164, February 2003 © 2003 American Association for Cancer Research Short Communications Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the American Cancer Society Cancer Prevention Study II Nutrition Cohort Heather Spencer Feigelson1 , Carolyn R. Jonas, Andreas S. Robertson, Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department of Epidemiology and Surveillance Research, American Cancer Society, National Home Office, Atlanta, Georgia 30329-4251 Recent studies suggest that the increased risk of breast cancer associated with alcohol consumption may be reduced by adequate folate intake. We examined this question among 66,561 postmenopausal women in the American Cancer Society Cancer Prevention Study II Nutrition Cohort. Related Press Releases •How What and How Much We Eat (And Drink) Affects Our Risk •Novel COX-2 Combination Treatment May Reduce Colon Cance Combination Regimen of COX-2 Inhibitor and Fish Oil Causes C •COX-2 Levels Are Elevated in Smokers Related AACR Workshops and Conferences •Frontiers in Cancer Prevention Research •Continuing Medical Education (CME) •Molecular Targets and Cancer Therapeutics Related Meeting Abstracts •Association between dietary folate intake, alcohol intake, and m •Folate, folate cofactor, and alcohol intakes and risk for colorect •Dietary folate intake and risk of prostate cancer in a large prosp Related Working Groups •Finance •Charter •Molecular Epidemiology Related Education Book Content Oral Contraceptives, Postmenopausal Hormones, and Physical Activity and Cancer Hormonal Interventions: From Adjuvant Therapy to Breast Cancer Prevention Related Awards •AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards •ACS Award •Weinstein Distinguished Lecture Webcasts Related Webcasts Think Tank Report Related Think Tank Report Content Copyright © 2013 Access Innovations, Inc.
  • 20. Link to Society Resources Journal Article on Topic A Other Journal Articles on Topic A Upcoming Conference on Topic A Podcast Interview with Researcher Working on Topic A Grant Available for Researchers Working on Topic A CME Activity on Topic A Job Posting for Expert on Topic A Copyright © 2013 Access Innovations, Inc.
  • 21. Thesaurus MasterThesaurus Master Machine Aided Indexer (M.A.I.™) Reposito ry Term Store Reposito ry Term Store Search Presentation: 90% accuracy Browse by Subject Auto-completion Broader Terms Narrower Terms Related Terms Search Presentation: 90% accuracy Browse by Subject Auto-completion Broader Terms Narrower Terms Related Terms Inline Tagging Metadata and Entity Extractor Automatic Summarization Search Software Search Software Client taxonomy Client taxonomy Taxonomy In Sharepoint Copyright © 2013 Access Innovations, Inc.
  • 22. Editorial Workflow Integration Author Submission Module The author fills in the data to the document template, attaching images and graphs as necessary. An API calls Data Harmony and generates a list of indexing terms based on the content. Copyright © 2013 Access Innovations, Inc.
  • 23. A Quick Look Behind the Scenes Database Management System Thesaurus tool Indexing Tool MAI •Validate terms •Add terms and rules •Change terms and rules •Delete terms and rules •Search thesaurus •Validate term entry •Block invalid terms •Record candidates •Establish rules for term use •Auto Suggest indexing terms Copyright © 2013 Access Innovations, Inc.
  • 24. Database Management System Add Metadata using MAI Search Inverted File Aadvark Alligator Apple Advantage …. Zebra Record locator Accessinn.com/12345/demofile/recid15 Database records Each with many elements Portal Searching
  • 25. Data Feed Portal page Thesaurus / taxonomy system add Metadata Data Base Management System Automatic Metadata Suggestions Search
  • 26. Data Base Management System Add Metadata using MAI Search Use taxonomy to search
  • 27. • Consistency – Machine Aided Indexing consistently suggests the same term under the same conditions. – Initial tests on units from client proved to be 20% more consistent than manual indexing. • Deeper Indexing • Machine Aided Indexing may suggest more terms per unit than manual indexers working under a per unit quota. • Faster Processing • Human verification of suggested indexing terms can be seven (7) times as fast as manual indexing. • Fully automatic indexing can run 8 million articles in 2 hours in the Amazon Cloud Service • Full text indexing article(30 page articles) at less than a minute per article07/25/13 Copyright (c) 1998 Access Innovations, Inc. Advantages of MAI
  • 28. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. Use of Thesaurus • Faster Rule Building • Determines Depth of Indexing • Increases Measurable Consistency • Provides Base for Statistical Computations
  • 29. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. MAI System Overview -- System Components 1. Thesaurus / Taxonomy 2. MAI Program a. Rule Builder b. MAI Engine (matches text to terms) c. Statistics Computation 3. Knowledge Base / Rule base 4. Editorial Evaluation
  • 30.
  • 31. How does auto indexing work? • The Modules • The syntax • The process • How do you know if it’s good? Statistics / evaluation • Rules of thumb
  • 32. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. Knowledge Base Creative Loop
  • 33. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. Edit • Interactive text editor for building rules • Provides insertion, deletion, copying of characters or marked blocks Reports • Creates Knowledge Base listings • Can output all rules, or selected rules MAI - Rule Builder
  • 34. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. MAI – Concept Extractor Input • Reads client’s electronic file format • Creates internal format for scanning • Modified project by project Interpreter • Scans reformatted input, selects phrases • Looks up text phrases in Knowledge Base • Interprets rule(s) associated with matched phrase Output • Produces electronic file of scanned units with suggested index terms for production of print file • Optionally produces data file for statistics program
  • 35. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. MAI - Statistics Computation How do you know if indexing is accurate? • Accepts optional data file created by MAI engine while processing units with existing index terms • Computes MAI ‘Hit’, ‘Miss’, ‘Noise’ figures Reports • Produces listing of terms with unit numbers categorized as ‘Hit’, ‘Miss’, or ‘Noise’
  • 36. Rules Type • Simple Condition (80%) • Identity • synonym • Complex Rules: • Compound Condition • Multiple Condition Explanation • If (condition is true) use term ‘A’ • Use term ‘B’ where ‘A’ (matched text) is synonymous with, but not identical to, ‘B’ • If (condition 1 is true and/or condition 2 is true…and/or condition n is true) use term ‘A’ • If (condition 1 is true…) use term ‘A’ else if (condition 2 is true…) use term ‘B’ else if (condition n is true…) use term ‘C’ 07/25/13 Copyright (c) 1998 Access Innovations, Inc. Rule Types
  • 37. • Condition Type • Proximity • Location • Format • Truncation • Explanation • Helps to determine context of a phrase by its relative location to another indicative phrase. • Determines where in the unit a phrase occurs • Checks the matched text for certain conditions such as capitalization, format (italic codes) or affixes (prefix or suffix to root word). • Left or right word stem isolation07/25/13 Copyright (c) 1998 Access Innovations, Inc. More options
  • 38. • NEAR “string” – If the matched text is within 3 words before or after another “string” in the same sentence. • WITH “string” – If the matched text is in the same sentence as another “string” • MENTIONS “string” – If the matched text is in the same abstract as another “string” • AROUND “string” – If the matched text is in the same text as another “string” 07/25/13 Proximity Conditions
  • 39. • in title – If the matched text is in “field” • in abstract – If the matched text is in “field” • begin sentence – If the matched text is located at the beginning of the sentence • end sentence – If the matched text is located at the end of the sentence 07/25/13 Copyright (c) 1998 Access Innovations, Inc. Location Conditions
  • 40. • all caps – If the matched text is all caps • initial caps – If the matched text begins with a capital letter • match “string 1”, – If the matched text begins with, ends “string 2” with, or is included within specific character strings(s), such as “<i>”, “</i>” or quotation sets 07/25/13 Format Conditions
  • 41. • Affix “string1”, – • If the matched text is a root word in“string2” the Knowledge Base which is prefixed (string1) or suffixed (string2) by specific character string(s), such as “pseudo” or “ization”. • Either string1 or string2 may be null(“”). • Additionally, it will allow wild card characters such as “*”. 07/25/13 Format Conditions (
  • 42. 07/25/13 Simple Rules: Identity text: artifact rule: use artifact text: accelerator mass spectrometry rule: use accelerator mass spectrometry text: National Socialism rule: use National Socialism use fascism
  • 43. 07/25/13 Simple Rules: Synonym text: testing rule: use analysis text: the history of rule: use history text: micrographs rule: use scanning electron microscopy text: casoni rule: use adobe
  • 44. 07/25/13 Complex Rules: Simple Condition Automatically created text: cave rule: if (near “Sandia”) use Sandia cave site text: packing rule: if (near “shipping”) use packing and shipping text: chin rule: if (all caps)use Canadian Heritage Information Network
  • 45. 07/25/13 Complex Rules: Compound Condition text: congress rule: if (initial caps and not begin sentence) and (with “u.s.” or with “united states”)) use U.S. Congress text: dye rule: if (in title and (near “natural” or with “vegetable”)) use natural dyeing text: x-ray rule: if (near “energy dispersive” or mentions “(xrd)” or (near “fluorescence” and with “spectroscopy”)) use energy-dispersive x-ray fluorescence spectroscopy
  • 46. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. Complex MAI Rule Rule: survey 12804: IF (near "geological" OR with "geophysics" OR with "oil recovery") USE geophysical prospecting ENDIF IF (mentions "interstellar" OR mentions "star" OR mentions "stars" OR mentions "Universe" OR mentions "stellar" OR mentions "galaxies" OR mentions "protostars" OR mentions "protostar" OR mentions "galactic" OR mentions "galaxy" OR mentions "radio source" OR mentions "HII regions" OR mentions "nebula" OR mentions "nova" OR mentions "supernova" OR mentions "nebulae" OR mentions "novae") USE astronomy and astrophysics ENDIF IF (mentions "discusses" OR mentions "presents" OR mentions "the author" OR mentions "this article" OR mentions "this book" OR mentions "this paper" OR mentions "the article" OR mentions "the book" OR mentions "the paper") USE reviews ENDIF
  • 47. • HIT – When the MAI engine generates an indexing term which is identical to an index term which would have been assigned by a human indexer • MISS – When the MAI engine fails to generate an indexing term which would have been assigned by a human indexer. • NOISE – When the MAI engine generates an indexing term which is incorrect, out of context, or illogical. 07/25/13 MAI Statistics Terms
  • 48. • CONSISTENCY – When topical or descriptive indexing terms are assigned in the same way from article to article over time. • DEPTH – A qualitative measure of the level and range of detail of the indexing (ie, from broader terms to narrower terms of a single concept). • BREADTH – A qualitative measure of the number or range of concepts indexed in an article. 07/25/13 Copyright (c) 1998 Access Innovations, Inc. MAI Statistics Terms
  • 49. 07/25/13 Copyright (c) 1998 Access Innovations, Inc.
  • 50. • Aim for 85% of higher accuracy – Hits of misses and noise • Maintain monthly – Thesaurus / taxonomy updates – From search logs – From new sources covered – From new initiatives and term usage • Maintain MAI with term changes and additions – 80% of rules are automatic match rules – Should create 6 – 10 rules per hour • Do regular random sampling to improve the indexing quality 1% • Reindex when taxonomy changes reach 5% of the term base • Hire people good at logic and word games 07/25/13 Copyright (c) 1998 Access Innovations, Inc. Rules of thumb
  • 51. Cost items • Software – Two main types – • Bayesian / statistical / co-occurrence • Rules based • Educating the system – Training the system based on collection of records where the terms are used correctly – 20% of terms may need disambiguation • Implementing the system – Setting the vectors (programmers) – Applying the rules (editors / taxonomists) • Ongoing maintenance – Resetting the vectors (programmers) – Updating the terms and rules (editors / taxonomists)
  • 52. • A. MAI System Components • Thesaurus • Knowledge Base • MAI Program • Editorial Evaluation • B. Use of Thesaurus • Word Standardization • Faster Rule Building • Increased Measurable Consistency 07/25/13 Copyright (c) 1998 Access Innovations, Inc. MAI Summary
  • 53. • C. Knowledge Base Components • Simple Rules • Complex Rules • D. MAI Program Modules • Rule Builder Module • MAI Engine Module • Statistics Computation • E. Advantages of Rules based MAI • Consistency • Deeper Indexing • Faster Thru Put 07/25/13 Copyright (c) 1998 Access Innovations, Inc. MAI Summary
  • 54. Taxonomy Is core to the system
  • 55. Thank youThank you Marjorie M.K. Hlava President Access Innovations / Data Harmony mhlava@accessinn.com www.dataharmony.com www.accessinn.com 505-998-0800

Notas del editor

  1. NICEM, the National Information Center for Educational Media, is a database of over 640,000 audio and visual items in all subject areas that apply to learning, from preschool through professional. Access Innovations applies subject terms from the NICEM taxonomy to each bibliographic record. In addition to that backend data work, we also created a search and presentation layer for the website, which we call Search Harmony. Here are some of the user-friendly features that are included in Search Harmony: [Click] Users can navigate the site by browsing the full taxonomy, and see the number of records tagged with each subject [CLICK] Auto-completion of search terms, which is a common feature of many search engines, but in this case the user is assisted in formulating a search by seeing a pick list of terms from the taxonomy, including all synonyms – even if they appear in the middle of a phrase. [CLICK] The user is also guided to expand the topic with broader terms or related terms from the taxonomy, or narrow the search to find more precise information. [CLICK] The resulting site has been recognized as an indispensable resource for educators.
  2. Thanks to Helen Atkins of AACR for this illustration. The real power of this is that the links can all go in all directions, so we take advantage of having the user’s attention regardless of how they step into our “web”