SlideShare una empresa de Scribd logo
1 de 39
Finding a Common Language:
Bringing Complex and Disparate
     Vocabularies Together


             Paula R. McCoy
     Manager, Taxonomy Development
                ProQuest
       paula.mccoy@proquest.com
Part of Cambridge Information Group & CSA

Headquartered in Ann Arbor, Michigan
Editorial offices in Louisville, Kentucky
Access to over 125 billion digital pages of content from
  magazine, trade, & scholarly publications, current &
historical newspapers, original materials such as annual
   reports & civil war pamphlets, and daily wire feeds

  Subscription-based ProQuest® online information
   service available in academic and public libraries
Louisville editors abstract & index 4,000+
periodicals & newspapers

ProQuest Controlled Vocabulary used to index
subjects; Authority Files used to index
company, geographic, personal, product names

CV applied to non-periodical & third-party
content via mapping, to allow cross-searching
of multiple DBs with one vocabulary
Topics of Discussion
Description of ProQuest Controlled
Vocabulary & Authority Files

Taxonomy Management -- Overview

Life Before Synaptica

Thesaurus Management System Purchase
Implementing Synaptica

Life With Synaptica

Q&A
PQ CV




        ProQuest Controlled Vocabulary

         Natural language, hierarchical vocabulary complying
         with ANSI/NISO Standard Z39.19 (Guidelines for
         the Construction, Format, and Management of
         Monolingual Controlled Vocabularies)

         Created in 1970s for ABI/INFORM business database

         Based on Library of Congress Subject Headings
PQ CV




        ProQuest Controlled Vocabulary
        Merged with general reference vocabulary in 1980s
        Major development effort in past 4 years to boost
        science, education & medical terms
        Thesaurus subjects:
          Business, economics & trade – 4300 terms
          Science, math & technology – 1600 terms
          Medicine – 1150 terms
          Humanities – 960 terms
          Government & policy – 850 terms
          Education – 400 terms
PQ CV




        ProQuest CV: Statistics

          Preferred terms: 11,046
          Non-preferred terms: 5631
          Scope Notes: 3194 (29%)
          Cross-references (Broader,
          Narrower, Related terms): 67,700
          Terms added in 2007: 77
          Terms added in 2008: 58+
PQ CV




        Authority Files: Statistics

         Corporate/Organization Names: 438,098
         Names added in 2008: 5489

         Personal Names: 416,239
         Names added in 2008: 1526

         Geographic (Location) Names: 34,331
         Names added in 2008: 144

         Product Names: 38,210
         Names added in 2008: 54
Taxonomy Management




                The Taxonomy Manager’s Job

                      Add subject terms as dictated by new
                      concepts & new content to index

                      Maintain hierarchies & Scope Notes

                      Load updated Thesaurus to ProQuest interface

                      Manage authority files to maintain standards
                      & control file size
Taxonomy Management




                The Taxonomy Manager’s Job

                                OBJECTIVE:

         To ensure that indexers and searchers alike have access to a
         complete and accurate Thesaurus that they can use to
         maximize the discoverability of documents in ProQuest
Taxonomy Management




                      Thesaurus on ProQuest®
Taxonomy Management




                        Sample Subject Term
                                          Preferred, or main term
                                                                    Scope note defining term
                                                                       and how it is used

          Chronic obstructive pulmonary disease
          SN: Any lung disease, such as chronic bronchitis or
          emphysema, causing obstruction of bronchial airflow       Non-preferred term: points
           UF COPD                                                    to term used to index
           BT Disease                                               Terms broader in nature to
           BT Respiratory diseases                                    main term: COPD is a
           NT Asthma                                                disease, and specifically, a
           NT Bronchitis                                                respiratory disease
           NT Emphysema
                                                                    Terms narrower in nature
           RT Airway management                                      to main term: these are
           RT Lungs                                                    chronic lung diseases

                                                                    Terms related to main term
                                                                       that might be used to
                                                                         narrow the search
Before Synaptica


         Managing terms meant:

Multiple files  Duplicate entries  Errors

 = less than ideal thesaurus management
Before Synaptica




                   MS Word Document
Before Synaptica




                   Vocabulary Documents in Word

                      ProQuest controlled vocabulary
                      French-language controlled vocabulary
                      German-language controlled vocabulary
                      Spanish-language controlled vocabulary
                      Combined PQ-CBCA controlled vocabulary
                      Ethnic database vocabulary, English
                      Ethnic database vocabulary, Spanish
Before Synaptica




                   Foreign-Language Vocabularies




               French         German       Spanish
Before Synaptica




                   Oracle Database Forms
Before Synaptica




                   Authority Files in Oracle

                    Class codes (related to subjects)
                    CORP names (391,665+ terms)
                    NAIC codes (related to companies)
                    GEOG names (32,000+ terms)
                    PERS names (350,000+ terms)
                    PROD names (38,000+ terms)
Before Synaptica




                          Adding New Terms

                   1. Enter full term hierarchy into new Word doc
                   2. Copy term into main Word-based vocabulary &
                      enter reciprocal relationships
                   3. Enter term & relationships into Oracle
                   4. Review next-day report on Oracle activity
                   5. Send new term doc to editors via e-mail
                   6. Print new vocabulary (at least every two years)
SN
     BT
     Class Code
       [whew!]

UF                  NT
          RT
TMS Purchase




               Thesaurus Management Systems
                       Buying Criteria
                         Synaptica


          Up to 40 admin & 100 in real time within multiple locations
           1. Ability to interact read-only users editorial system
          Ability to load vocabs from multiple Word docs & Oracle
          authority filesaccommodate authority files of 400,000+
            2. Ability to
               names
          Support for foreign-language vocabularies
          Ability to add new vocabularies
          Vendor onsite installation & training
          Software upgrades & tech support
Implementing Synaptica




                         Implementing Synaptica

                   Contract signed and work begun in August 2004

                   PQ sent to Synaptica all the Word & Oracle files for
                   analysis

                   Decision points: how to load & structure data;
                   how to handle “suspect” or erroneous
                   relationships
Implementing Synaptica




                         Synaptica Data Analysis
                          Relationship Validation Tests:

                            Term Uniqueness
                            Use Violations
                            Self-Referencing Relationships
                            One Relationship per Term Pair
                            Relationship Unique
                            Relationship Reciprocates
                            Circular References

     Exception Reports delivered to PQ; Errors fixed before production
Implementing Synaptica




                         Use Validation Error

                            Marine resources
Implementing Synaptica




                         Foreign-Language Errors

           Terms with no language equivalent (LEQ), e.g., no translation

           In all 3 languages, multiple English terms with the same
            translation, e.g.:

         English term          French term      French term-revised
          Purchasing            Achats
          Shopping              Achats           Shopping
          Buyers                Acheteurs
          Purchasing agents     Acheteurs        Agents d'achat
Implementing Synaptica




                            Final Challenge

              Issue:     Different editorial systems = 2x data
                         entry: once for Synaptica, once for Oracle

              Solution: Overnight synchronization process to copy
                        Synaptica work into Oracle every night

                         Synch process discontinued April 2008
Implementing Synaptica




             Putting Synaptica Into Production
                                   Nov 2004

                Train users — provide documentation & hands-on
                demonstrative training

                Deal with people resistant to change

                Encourage written feedback on system functionality
                Send feedback to Synaptica – many of our suggestions
                implemented in later versions
Life With Synaptica




                        Life With Synaptica
                      Terms Management Made Easy!




              Word – Old, Bad    Synaptica – New, Good 
Life With Synaptica




              Adding Terms Today: 3 Easy Steps

                      1. Enter term and relationships into Synaptica
                         “Item Details” window

                      2. Export report of new terms into Word

                      3. Send Word document to editors
Life With Synaptica




            Improving Thesaurus Management
                      Categories Feature
Life With Synaptica




                      Subject Term Categories
Life With Synaptica




         CORP Names – Categories & Website
Life With Synaptica




                  Foreign-Language Vocabularies



                                            Language
                                           Equivalents
Life With Synaptica




                  Foreign-Language Vocabularies
Life With Synaptica




                  Foreign-Language Vocabularies
                               Spanish




                               Spanish




                                           Alphabetical
                                           by language



                      German             French
Life With Synaptica




                               Synaptica Updates

                        Synaptica version 6.0 released in early 2006

                        Synaptica version 7.0 is being implemented now:

                      • Enhanced user interface
                      • Semantic Web standardization (RDF, OWL, SKOS) and
                         Web Services integration
                      • Expanded Reporting functionality
                      • Enhanced adding and editing of term relationships
                         including “rapid-fire” simple drag-and-drop editing
                      • Improved global term editing
                      • Online help and user guides
Life With Synaptica




                             Benefits of Synaptica
                      Greater awareness of thesaurus standards and
                      terminology, e.g.: “preferred” and “non-preferred”
                      instead of Use and Used For
                      Long-needed updating and improvement in term
                      hierarchies; ability to provide thesaurus statistics
                      Increase in Company name NPTs — from 1935 to
                       8952 today
                      Immediate responsiveness to indexer needs —
                       real-time term additions, esp. NPTs and SNs
                      Easier loading of updated Thesaurus on PQ interface
Questions?

thank you!

Más contenido relacionado

Similar a ProQuest Taxonomy Boot Camp Presentation 2008

Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfJemalNesre1
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
 
Terminology: tips and tricks to boost your terminology work
Terminology: tips and tricks to boost your terminology workTerminology: tips and tricks to boost your terminology work
Terminology: tips and tricks to boost your terminology workLaura Ramirez Polo
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 
Custom Query Languages: Why? How?
Custom Query Languages: Why? How?Custom Query Languages: Why? How?
Custom Query Languages: Why? How?J On The Beach
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big DataSameer Wadkar
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalcaptainmactavish1996
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningII-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningDr. Haxel Consult
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010Bernard Vatant
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfHabtamu100
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulationstbruce
 
Usability-focused Clinical Decision Support with the Help of Semantic Technol...
Usability-focused Clinical Decision Support with the Help of Semantic Technol...Usability-focused Clinical Decision Support with the Help of Semantic Technol...
Usability-focused Clinical Decision Support with the Help of Semantic Technol...Plan de Calidad para el SNS
 

Similar a ProQuest Taxonomy Boot Camp Presentation 2008 (20)

Indexing
IndexingIndexing
Indexing
 
Textmining
TextminingTextmining
Textmining
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Terminology: tips and tricks to boost your terminology work
Terminology: tips and tricks to boost your terminology workTerminology: tips and tricks to boost your terminology work
Terminology: tips and tricks to boost your terminology work
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 
Descript transcription.pdf
Descript transcription.pdfDescript transcription.pdf
Descript transcription.pdf
 
Custom Query Languages: Why? How?
Custom Query Languages: Why? How?Custom Query Languages: Why? How?
Custom Query Languages: Why? How?
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningII-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
 
Usability-focused Clinical Decision Support with the Help of Semantic Technol...
Usability-focused Clinical Decision Support with the Help of Semantic Technol...Usability-focused Clinical Decision Support with the Help of Semantic Technol...
Usability-focused Clinical Decision Support with the Help of Semantic Technol...
 

Más de Synaptica, LLC

Using ontologies for more than information categorization
Using ontologies for more than information categorizationUsing ontologies for more than information categorization
Using ontologies for more than information categorizationSynaptica, LLC
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-ExpertsSynaptica, LLC
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Synaptica, LLC
 
SKOS-XL vs. Traditional Term Based Taxonomy Management
SKOS-XL vs. Traditional Term Based Taxonomy ManagementSKOS-XL vs. Traditional Term Based Taxonomy Management
SKOS-XL vs. Traditional Term Based Taxonomy ManagementSynaptica, LLC
 
Successfully Managing Multilingual Taxonomies: 3 Methods
Successfully Managing Multilingual Taxonomies: 3 MethodsSuccessfully Managing Multilingual Taxonomies: 3 Methods
Successfully Managing Multilingual Taxonomies: 3 MethodsSynaptica, LLC
 

Más de Synaptica, LLC (6)

Using ontologies for more than information categorization
Using ontologies for more than information categorizationUsing ontologies for more than information categorization
Using ontologies for more than information categorization
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-Experts
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
 
SKOS-XL vs. Traditional Term Based Taxonomy Management
SKOS-XL vs. Traditional Term Based Taxonomy ManagementSKOS-XL vs. Traditional Term Based Taxonomy Management
SKOS-XL vs. Traditional Term Based Taxonomy Management
 
Successfully Managing Multilingual Taxonomies: 3 Methods
Successfully Managing Multilingual Taxonomies: 3 MethodsSuccessfully Managing Multilingual Taxonomies: 3 Methods
Successfully Managing Multilingual Taxonomies: 3 Methods
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

ProQuest Taxonomy Boot Camp Presentation 2008

  • 1. Finding a Common Language: Bringing Complex and Disparate Vocabularies Together Paula R. McCoy Manager, Taxonomy Development ProQuest paula.mccoy@proquest.com
  • 2. Part of Cambridge Information Group & CSA Headquartered in Ann Arbor, Michigan Editorial offices in Louisville, Kentucky
  • 3. Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current & historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries
  • 4. Louisville editors abstract & index 4,000+ periodicals & newspapers ProQuest Controlled Vocabulary used to index subjects; Authority Files used to index company, geographic, personal, product names CV applied to non-periodical & third-party content via mapping, to allow cross-searching of multiple DBs with one vocabulary
  • 5. Topics of Discussion Description of ProQuest Controlled Vocabulary & Authority Files Taxonomy Management -- Overview Life Before Synaptica Thesaurus Management System Purchase Implementing Synaptica Life With Synaptica Q&A
  • 6. PQ CV ProQuest Controlled Vocabulary Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) Created in 1970s for ABI/INFORM business database Based on Library of Congress Subject Headings
  • 7. PQ CV ProQuest Controlled Vocabulary Merged with general reference vocabulary in 1980s Major development effort in past 4 years to boost science, education & medical terms Thesaurus subjects: Business, economics & trade – 4300 terms Science, math & technology – 1600 terms Medicine – 1150 terms Humanities – 960 terms Government & policy – 850 terms Education – 400 terms
  • 8. PQ CV ProQuest CV: Statistics Preferred terms: 11,046 Non-preferred terms: 5631 Scope Notes: 3194 (29%) Cross-references (Broader, Narrower, Related terms): 67,700 Terms added in 2007: 77 Terms added in 2008: 58+
  • 9. PQ CV Authority Files: Statistics Corporate/Organization Names: 438,098 Names added in 2008: 5489 Personal Names: 416,239 Names added in 2008: 1526 Geographic (Location) Names: 34,331 Names added in 2008: 144 Product Names: 38,210 Names added in 2008: 54
  • 10. Taxonomy Management The Taxonomy Manager’s Job Add subject terms as dictated by new concepts & new content to index Maintain hierarchies & Scope Notes Load updated Thesaurus to ProQuest interface Manage authority files to maintain standards & control file size
  • 11. Taxonomy Management The Taxonomy Manager’s Job OBJECTIVE: To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest
  • 12. Taxonomy Management Thesaurus on ProQuest®
  • 13. Taxonomy Management Sample Subject Term Preferred, or main term Scope note defining term and how it is used Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow Non-preferred term: points UF COPD to term used to index BT Disease Terms broader in nature to BT Respiratory diseases main term: COPD is a NT Asthma disease, and specifically, a NT Bronchitis respiratory disease NT Emphysema Terms narrower in nature RT Airway management to main term: these are RT Lungs chronic lung diseases Terms related to main term that might be used to narrow the search
  • 14. Before Synaptica Managing terms meant: Multiple files  Duplicate entries  Errors = less than ideal thesaurus management
  • 15. Before Synaptica MS Word Document
  • 16. Before Synaptica Vocabulary Documents in Word ProQuest controlled vocabulary French-language controlled vocabulary German-language controlled vocabulary Spanish-language controlled vocabulary Combined PQ-CBCA controlled vocabulary Ethnic database vocabulary, English Ethnic database vocabulary, Spanish
  • 17. Before Synaptica Foreign-Language Vocabularies French German Spanish
  • 18. Before Synaptica Oracle Database Forms
  • 19. Before Synaptica Authority Files in Oracle Class codes (related to subjects) CORP names (391,665+ terms) NAIC codes (related to companies) GEOG names (32,000+ terms) PERS names (350,000+ terms) PROD names (38,000+ terms)
  • 20. Before Synaptica Adding New Terms 1. Enter full term hierarchy into new Word doc 2. Copy term into main Word-based vocabulary & enter reciprocal relationships 3. Enter term & relationships into Oracle 4. Review next-day report on Oracle activity 5. Send new term doc to editors via e-mail 6. Print new vocabulary (at least every two years)
  • 21. SN BT Class Code [whew!] UF NT RT
  • 22. TMS Purchase Thesaurus Management Systems Buying Criteria Synaptica Up to 40 admin & 100 in real time within multiple locations 1. Ability to interact read-only users editorial system Ability to load vocabs from multiple Word docs & Oracle authority filesaccommodate authority files of 400,000+ 2. Ability to names Support for foreign-language vocabularies Ability to add new vocabularies Vendor onsite installation & training Software upgrades & tech support
  • 23. Implementing Synaptica Implementing Synaptica Contract signed and work begun in August 2004 PQ sent to Synaptica all the Word & Oracle files for analysis Decision points: how to load & structure data; how to handle “suspect” or erroneous relationships
  • 24. Implementing Synaptica Synaptica Data Analysis Relationship Validation Tests: Term Uniqueness Use Violations Self-Referencing Relationships One Relationship per Term Pair Relationship Unique Relationship Reciprocates Circular References Exception Reports delivered to PQ; Errors fixed before production
  • 25. Implementing Synaptica Use Validation Error Marine resources
  • 26. Implementing Synaptica Foreign-Language Errors Terms with no language equivalent (LEQ), e.g., no translation In all 3 languages, multiple English terms with the same translation, e.g.: English term French term French term-revised Purchasing Achats Shopping Achats Shopping Buyers Acheteurs Purchasing agents Acheteurs Agents d'achat
  • 27. Implementing Synaptica Final Challenge Issue: Different editorial systems = 2x data entry: once for Synaptica, once for Oracle Solution: Overnight synchronization process to copy Synaptica work into Oracle every night Synch process discontinued April 2008
  • 28. Implementing Synaptica Putting Synaptica Into Production Nov 2004 Train users — provide documentation & hands-on demonstrative training Deal with people resistant to change Encourage written feedback on system functionality Send feedback to Synaptica – many of our suggestions implemented in later versions
  • 29. Life With Synaptica Life With Synaptica Terms Management Made Easy! Word – Old, Bad  Synaptica – New, Good 
  • 30. Life With Synaptica Adding Terms Today: 3 Easy Steps 1. Enter term and relationships into Synaptica “Item Details” window 2. Export report of new terms into Word 3. Send Word document to editors
  • 31. Life With Synaptica Improving Thesaurus Management Categories Feature
  • 32. Life With Synaptica Subject Term Categories
  • 33. Life With Synaptica CORP Names – Categories & Website
  • 34. Life With Synaptica Foreign-Language Vocabularies Language Equivalents
  • 35. Life With Synaptica Foreign-Language Vocabularies
  • 36. Life With Synaptica Foreign-Language Vocabularies Spanish Spanish Alphabetical by language German French
  • 37. Life With Synaptica Synaptica Updates Synaptica version 6.0 released in early 2006 Synaptica version 7.0 is being implemented now: • Enhanced user interface • Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration • Expanded Reporting functionality • Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing • Improved global term editing • Online help and user guides
  • 38. Life With Synaptica Benefits of Synaptica Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non-preferred” instead of Use and Used For Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics Increase in Company name NPTs — from 1935 to 8952 today Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs Easier loading of updated Thesaurus on PQ interface