SlideShare una empresa de Scribd logo
1 de 22
Using a Semantic Analysis Tool to
Generate Subject Access Points:
A Study using Panofsky’s Theory
and Two Research Samples
Marcia Lei Zeng
Karen F. Gracy,
Kent State University, USA
&
Maja Žumer
University of Ljubljana, Slovenia
13th International Conference (ISKO 2014)
Krakow, Poland,. May 19th-22nd 2014
Background
The Research Question:
• the assessment of alternative approaches of generating
subject access points to the materials that are usually
not made available through regular library catalog
routines.
– Subject access is critical for cross-institutional digital
libraries.
– Limited subject access points are particularly critical with
very large-scale resources of cross-institutional collections.
– LAMs are recognizing the impracticality and impossibility
of conducting exhaustive traditional subject analysis.
Example of alternative:
-- Natural language processing and semantic annotation
software-suggested
access points:
• named entities
• topics)
• relations of the
contents of a given
resource
Panofsky’s three-layer framework and the simplified layers used by CCO
An example of
Using a Semantic Analysis Tool to
Generate Subject Access Points
Finding Aids : Pearl Harbor Attack (Dec 6-Dec 8, 1941)
Source: http://www.fdrlibrary.marist.edu/archives/pdfs/findingaids/findingaid_pearlharborattack.pdf
Source: http://www.fdrlibrary.marist.edu/archives/pdfs/findingaids/findingaid_pearlharborattack.pdf
(cont.) Finding Aids : Pearl Harbor Attack (Dec 6-Dec 8, 1941)
Two research samples were used to
analyze the access points
supplied by
OpenCalais semantic analysis tool
Input • batch process
• single file upload
• copy-paste
1. Obtain text
• call OpenCalais
• perform entity
extraction
2. Extract
entities & tags • convert from
JSON to CSV
• clean up through
OpenRefine
3. Convert &
Clean up
text
Output
structured
data
• 43 archival record groups
• from sixteen institutions,
including
– university archives,
– government records archives,
and
– manuscript/special collections
repositories in various LAMs.
• Text from the archival
finding aids
– Descriptive information
• creator histories
• scope and content notes
• detailed description of contents,
including folder and item titles
– Abstracts from these descriptions
The Process (Mainly automatic)
Sample 1: Finding Aids
Original Panofsky Object
of Interpretation
Level 1
-Primary or natural
subject matter
(A) factual,
(B) expressional-,
constituting the world of
artistic motifs
Level 2
-Secondary or
conventional subject
matter, constituting the
world of images, stories
and allegories.
CCO Simplified layers
1 - Description
(refer to the generic elements
depicted in or by the work).
2 - Identification
(refer to the specific subject).
ofness
ofness & aboutness &
(limited from ofness aspects)
Entities correctly identified via
Calais analysis (at level one, or,
description) included:
– personal names (Person),
– corporate names (Company,
Facility, Organization),
– geographic names (City,
Continent, Country, Natural
Feature, ProvinceOrState,
Region), and
– events (Holiday, PoliticalEvent).
Calais provides relevance scores
for each identified entity, which
may be used as a valuable clue
about the importance of that
entity to the overall scope of the
archival collection.
Findings
In addition to entities, Calais also generated many
topical terms describing the subject matter of the
records (level two, or, identification)
• These topics were often found
– as social tags or
– as entities under the “IndustryTerm”
– or as entities “Product” category.
These categorizations were the least reliable in terms
of accuracy:
• incorrectly identified text strings from the finding aids as
products or industry terms.
• analysis of detailed description areas was most likely to
lead to incorrect identification of text strings because the
descriptions have the physical location information
intermingled.
• Reason: the raw data (unedited) that was fed to the
engine:
– the entire finding aid was used
– often included physical location information for the
records and document formatting
Targeted analysis of particular areas of the finding aids
may result in better accuracy for topical analysis.
Findings
Suggestions based on Sample 1 (Finding Aids)
• It would be well worth the effort for institutions to
experiment with semantic analysis methods as
– an initial step to suggest key entities and topics, or
– as a final check to ensure that important concepts or
entities have not been overlooked.
• For certain types of records, particularly those for
which subject indexing is not common, semantic
analysis may provide entry points to archival records
that were not previously available.
• Such techniques will enhance subject analysis at the
first two levels (description and identification), but are
unlikely to be useful for interpretation of the material.
Sample 2
• 44 philosophy theses
– a selected sub-sample (22)
from KentLINK; and
– a random sample (22) from
OhioLINK.
• Abstracts,
• titles,
• keywords, and
• introduction paragraphs
• Process (manual)
1. Submitted to OpenCalais separately
to obtain the results.
2. All of the candidate terms were
counted according to Agent Names,
Geographic Names, Corporate Name,
and Topic Terms.
3. They were manually validated to
determine
1. the relevance to the thesis,
2. the type of a term (e.g., named
entity, tag, or general heading),
3. its availability in
1. LCNAF,
2. LCSH,
3. Wikipedia (as an entry), and
4. the Stanford Encyclopedia of
Philosophy.
Original Panofsky Object
of Interpretation
Level 1
-Primary or natural
subject matter
(A) factual,
(B) expressional-,
constituting the world of
artistic motifs
Level 2
-Secondary or
conventional subject
matter, constituting the
world of images, stories
and allegories.
Additional level:
- Inferencing
CCO Simplified layers
1 - Description
(refer to the generic elements
depicted in or by the work).
2 - Identification
(refer to the specific subject).
ofness
ofness & aboutness &
(limited from ofness aspects)
aboutness &
(very general)
Research Findings from Sample 2
• Semantic analysis based on the
abstracts generated more
successful tags than those based
on the titles.
• Some entity names missed in
the Entity section were often
correctly extracted into the tags
section
– E.g., singular names such as
Plato and Aristotle, or
– instances where the first
name was not included
• Major concepts were correctly
identified in most cases.
• The software often over-generalized the subjects
by assigning very general terms (e.g.,
“philosophy,” for almost every philosophy thesis)
and
• Some terms that were unrelated to the subject of
the thesis.
• This level is different from “identification” and “description”,
seems to be “inferencing”.
Research Findings from Sample 2
(cont.)
KentLINK sub-sample:
• average 9 tags per abstract,
• an average of 1.64 were overly broad topical terms and
• an average of 3.45 were unrelated topical terms (slightly more
than 1/3).
OhioLINK sub-sample had similar figures.
Suggestions based on Sample 2
• Level I “description” -- the tags did very well
• Level II “identification” – adequate
• The tags that could be categorized as “inferencing” results seemed to be
less valid according to the best practices of cataloging and subject
indexing.
– The overly-broad topic terms are not wrong (e.g., philosophy, knowledge,
science) but their relevance in terms of subject access is questionable.
• The promising news: among the topical terms (including named entities as
topics),
– LCSH together with LCNAF could match about 75% of them closely (we used
the degree as closeMatch, in comparison to broadMatch, narrowMatch or
noMatch),
– DBpedia matches almost 98% with closeMatch degree for both sub-samples.
• These vocabulary sources hold great potential for these
subject access points to become the linking point to the
Linked Data datasets that use DBpedia and LC vocabulary URIs
as their basis.
Future Research
Most helpful,
good, high
exhaustivity
adequate,
(depending on the
domain and raw data)
I. description
(ofness)
II. identification
(aboutness & ofness)
maybe useful
inferencing
(aboutness)
Not applicable
III. interpretation
(aboutness)
Need User Studies
Acknowledgement
• The authors would like to thank research
assistants Sammy Davidson and Laurence
Skirvin of Kent State University for assisting
with OpenCalais-related processes.
• The research is a sub-project of Metadata-
Junctions, a project funded by IMLS National
Leadership Grant, 2011-2013.

Más contenido relacionado

Destacado

AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD MicrothesauriMarcia Zeng
 
国际图象互操作框架 (IIIF)
国际图象互操作框架(IIIF)国际图象互操作框架(IIIF)
国际图象互操作框架 (IIIF)Marcia Zeng
 
ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMVladimir Alexiev, PhD, PMP
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Fabio Benedetti
 
Ma. teresa actividad 1.2 unidad iv
Ma. teresa actividad 1.2 unidad ivMa. teresa actividad 1.2 unidad iv
Ma. teresa actividad 1.2 unidad ivTere Segura
 
Inocencio meléndez julio. investigación. importancia de los costos y presup...
Inocencio meléndez julio. investigación. importancia de los costos y presup...Inocencio meléndez julio. investigación. importancia de los costos y presup...
Inocencio meléndez julio. investigación. importancia de los costos y presup...INOCENCIO MELÉNDEZ JULIO
 
Manifiesto para la simplificacion de los sistemas de calidad
Manifiesto para la simplificacion de los sistemas de calidadManifiesto para la simplificacion de los sistemas de calidad
Manifiesto para la simplificacion de los sistemas de calidadOrganiza tu empresa
 
What about social media marketing
What about social media marketingWhat about social media marketing
What about social media marketingSophia Müller
 
Solarstorm im Fokus: Gleichzeitigkeit von Produktion und Verbrauch
Solarstorm im Fokus: Gleichzeitigkeit von Produktion und VerbrauchSolarstorm im Fokus: Gleichzeitigkeit von Produktion und Verbrauch
Solarstorm im Fokus: Gleichzeitigkeit von Produktion und VerbrauchVorname Nachname
 
Inocencio meléndez julio. preacuerdo empresarial. el interaprendizaje es el...
Inocencio meléndez julio. preacuerdo empresarial.  el interaprendizaje es el...Inocencio meléndez julio. preacuerdo empresarial.  el interaprendizaje es el...
Inocencio meléndez julio. preacuerdo empresarial. el interaprendizaje es el...INOCENCIO MELÉNDEZ JULIO
 
Inocencio meléndez julio. preacuerdo empresarial. metodologia del trabajo a...
Inocencio meléndez julio. preacuerdo empresarial.  metodologia del trabajo a...Inocencio meléndez julio. preacuerdo empresarial.  metodologia del trabajo a...
Inocencio meléndez julio. preacuerdo empresarial. metodologia del trabajo a...INOCENCIO MELÉNDEZ JULIO
 
Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...
Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...
Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...INOCENCIO MELÉNDEZ JULIO
 
Codigo12/Concurso-SJC2015
Codigo12/Concurso-SJC2015Codigo12/Concurso-SJC2015
Codigo12/Concurso-SJC2015taniacampelo
 
Gumer sindolo 99
Gumer sindolo 99Gumer sindolo 99
Gumer sindolo 99gumer-sindo
 

Destacado (20)

AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD Microthesauri
 
国际图象互操作框架 (IIIF)
国际图象互操作框架(IIIF)国际图象互操作框架(IIIF)
国际图象互操作框架 (IIIF)
 
ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRM
 
Comm 125 Portfolio
Comm 125 Portfolio Comm 125 Portfolio
Comm 125 Portfolio
 
Bus
BusBus
Bus
 
Presentacion trm 1
Presentacion trm 1 Presentacion trm 1
Presentacion trm 1
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
 
Ma. teresa actividad 1.2 unidad iv
Ma. teresa actividad 1.2 unidad ivMa. teresa actividad 1.2 unidad iv
Ma. teresa actividad 1.2 unidad iv
 
Inocencio meléndez julio. investigación. importancia de los costos y presup...
Inocencio meléndez julio. investigación. importancia de los costos y presup...Inocencio meléndez julio. investigación. importancia de los costos y presup...
Inocencio meléndez julio. investigación. importancia de los costos y presup...
 
Manifiesto para la simplificacion de los sistemas de calidad
Manifiesto para la simplificacion de los sistemas de calidadManifiesto para la simplificacion de los sistemas de calidad
Manifiesto para la simplificacion de los sistemas de calidad
 
What about social media marketing
What about social media marketingWhat about social media marketing
What about social media marketing
 
Wärme über die Gasse
Wärme über die GasseWärme über die Gasse
Wärme über die Gasse
 
Solarstorm im Fokus: Gleichzeitigkeit von Produktion und Verbrauch
Solarstorm im Fokus: Gleichzeitigkeit von Produktion und VerbrauchSolarstorm im Fokus: Gleichzeitigkeit von Produktion und Verbrauch
Solarstorm im Fokus: Gleichzeitigkeit von Produktion und Verbrauch
 
Inocencio meléndez julio. preacuerdo empresarial. el interaprendizaje es el...
Inocencio meléndez julio. preacuerdo empresarial.  el interaprendizaje es el...Inocencio meléndez julio. preacuerdo empresarial.  el interaprendizaje es el...
Inocencio meléndez julio. preacuerdo empresarial. el interaprendizaje es el...
 
Inocencio meléndez julio. preacuerdo empresarial. metodologia del trabajo a...
Inocencio meléndez julio. preacuerdo empresarial.  metodologia del trabajo a...Inocencio meléndez julio. preacuerdo empresarial.  metodologia del trabajo a...
Inocencio meléndez julio. preacuerdo empresarial. metodologia del trabajo a...
 
Interruptores termo magnéticos
Interruptores termo magnéticosInterruptores termo magnéticos
Interruptores termo magnéticos
 
Relajate escucha mira_y_admira
Relajate  escucha mira_y_admiraRelajate  escucha mira_y_admira
Relajate escucha mira_y_admira
 
Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...
Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...
Inocencio meléndez julio. estado de origen y uso de fondos. inocencio melend...
 
Codigo12/Concurso-SJC2015
Codigo12/Concurso-SJC2015Codigo12/Concurso-SJC2015
Codigo12/Concurso-SJC2015
 
Gumer sindolo 99
Gumer sindolo 99Gumer sindolo 99
Gumer sindolo 99
 

Similar a Using a Semantic Analysis Tool to Generate Subject Access Points: A Study using Panofsky’s Theory and Two Research Samples

Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...
Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...
Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...Galala University
 
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewArc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewGalala University
 
The Literature and Study Review and Ethical Concern
The Literature and Study  Review and Ethical ConcernThe Literature and Study  Review and Ethical Concern
The Literature and Study Review and Ethical ConcernJo Balucanag - Bitonio
 
Research Sources & Techniques
Research Sources & TechniquesResearch Sources & Techniques
Research Sources & TechniquesGina Singh
 
Print source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxPrint source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxsanjaychavan62
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
 
Research Sources
Research SourcesResearch Sources
Research SourcesGina Singh
 
Lib 103 fall 2010 third ed textbook
Lib 103 fall 2010 third ed textbookLib 103 fall 2010 third ed textbook
Lib 103 fall 2010 third ed textbookPAPrice
 
A Genre Analysis Of Literature Reviews In Doctoral Theses
A Genre Analysis Of Literature Reviews In Doctoral ThesesA Genre Analysis Of Literature Reviews In Doctoral Theses
A Genre Analysis Of Literature Reviews In Doctoral ThesesSarah Adams
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
 
Research Sources & Techniques
Research Sources  & TechniquesResearch Sources  & Techniques
Research Sources & TechniquesGina Singh
 
Research Sources and Techniques
Research Sources and TechniquesResearch Sources and Techniques
Research Sources and TechniquesGina Singh
 
Review of literature
Review of literatureReview of literature
Review of literaturePeEr FaiZan
 
Aspects And Methods Of Fictional Literature Knowledge Organization
Aspects And Methods Of Fictional Literature Knowledge OrganizationAspects And Methods Of Fictional Literature Knowledge Organization
Aspects And Methods Of Fictional Literature Knowledge OrganizationJoe Andelija
 
From TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docxFrom TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docxhanneloremccaffery
 
Literature review
Literature reviewLiterature review
Literature reviewunmgrc
 
Research methods presentation
Research methods presentationResearch methods presentation
Research methods presentationjjory7
 
UNIT III ( Review of literature.pptx
UNIT III ( Review of literature.pptxUNIT III ( Review of literature.pptx
UNIT III ( Review of literature.pptxPrincipalVMSCON
 

Similar a Using a Semantic Analysis Tool to Generate Subject Access Points: A Study using Panofsky’s Theory and Two Research Samples (20)

Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...
Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...
Research Methods in Architecture - Literature Review - البحث المعمارى - البحث...
 
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewArc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
 
The Literature and Study Review and Ethical Concern
The Literature and Study  Review and Ethical ConcernThe Literature and Study  Review and Ethical Concern
The Literature and Study Review and Ethical Concern
 
Research Sources & Techniques
Research Sources & TechniquesResearch Sources & Techniques
Research Sources & Techniques
 
Print source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxPrint source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptx
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
Research Sources
Research SourcesResearch Sources
Research Sources
 
Lib 103 fall 2010 third ed textbook
Lib 103 fall 2010 third ed textbookLib 103 fall 2010 third ed textbook
Lib 103 fall 2010 third ed textbook
 
A Genre Analysis Of Literature Reviews In Doctoral Theses
A Genre Analysis Of Literature Reviews In Doctoral ThesesA Genre Analysis Of Literature Reviews In Doctoral Theses
A Genre Analysis Of Literature Reviews In Doctoral Theses
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
Research Sources & Techniques
Research Sources  & TechniquesResearch Sources  & Techniques
Research Sources & Techniques
 
Research Sources and Techniques
Research Sources and TechniquesResearch Sources and Techniques
Research Sources and Techniques
 
Review of literature
Review of literatureReview of literature
Review of literature
 
Conducting a literature search
Conducting a literature searchConducting a literature search
Conducting a literature search
 
Aspects And Methods Of Fictional Literature Knowledge Organization
Aspects And Methods Of Fictional Literature Knowledge OrganizationAspects And Methods Of Fictional Literature Knowledge Organization
Aspects And Methods Of Fictional Literature Knowledge Organization
 
From TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docxFrom TeacherTo assist you with preparing the Week 7 assignment.docx
From TeacherTo assist you with preparing the Week 7 assignment.docx
 
Literature review
Literature reviewLiterature review
Literature review
 
rm 2.ppt
rm 2.pptrm 2.ppt
rm 2.ppt
 
Research methods presentation
Research methods presentationResearch methods presentation
Research methods presentation
 
UNIT III ( Review of literature.pptx
UNIT III ( Review of literature.pptxUNIT III ( Review of literature.pptx
UNIT III ( Review of literature.pptx
 

Más de Marcia Zeng

理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者
理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者 理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者
理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者 Marcia Zeng
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Marcia Zeng
 
Modelling Knowledge Organization Systems and Structures
Modelling Knowledge Organization Systems and StructuresModelling Knowledge Organization Systems and Structures
Modelling Knowledge Organization Systems and StructuresMarcia Zeng
 
FRSAD Functional Requirements for Subject Authority Data model
FRSAD Functional Requirements for Subject Authority Data modelFRSAD Functional Requirements for Subject Authority Data model
FRSAD Functional Requirements for Subject Authority Data modelMarcia Zeng
 
SKOS for Classification Systems
SKOS for Classification SystemsSKOS for Classification Systems
SKOS for Classification SystemsMarcia Zeng
 
Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]Marcia Zeng
 
ISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesMarcia Zeng
 
Expressing Classification Schemes -- Part 3
Expressing Classification Schemes -- Part 3Expressing Classification Schemes -- Part 3
Expressing Classification Schemes -- Part 3Marcia Zeng
 
Introducing FRSAD and Mapping it with Other Models
Introducing FRSAD and Mapping it with Other ModelsIntroducing FRSAD and Mapping it with Other Models
Introducing FRSAD and Mapping it with Other ModelsMarcia Zeng
 
SKOS and Its Application in Transferring Traditional Thesauri into Networked KOS
SKOS and Its Application in Transferring Traditional Thesauri into Networked KOSSKOS and Its Application in Transferring Traditional Thesauri into Networked KOS
SKOS and Its Application in Transferring Traditional Thesauri into Networked KOSMarcia Zeng
 
Metadata for Terminology / KOS Resources
Metadata for Terminology / KOS ResourcesMetadata for Terminology / KOS Resources
Metadata for Terminology / KOS ResourcesMarcia Zeng
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology RegistriesMarcia Zeng
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 

Más de Marcia Zeng (14)

理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者
理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者 理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者
理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和消费者
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...
 
Modelling Knowledge Organization Systems and Structures
Modelling Knowledge Organization Systems and StructuresModelling Knowledge Organization Systems and Structures
Modelling Knowledge Organization Systems and Structures
 
FRSAD Functional Requirements for Subject Authority Data model
FRSAD Functional Requirements for Subject Authority Data modelFRSAD Functional Requirements for Subject Authority Data model
FRSAD Functional Requirements for Subject Authority Data model
 
SKOS for Classification Systems
SKOS for Classification SystemsSKOS for Classification Systems
SKOS for Classification Systems
 
Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]
 
ISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other Vocabularies
 
Expressing Classification Schemes -- Part 3
Expressing Classification Schemes -- Part 3Expressing Classification Schemes -- Part 3
Expressing Classification Schemes -- Part 3
 
Introducing FRSAD and Mapping it with Other Models
Introducing FRSAD and Mapping it with Other ModelsIntroducing FRSAD and Mapping it with Other Models
Introducing FRSAD and Mapping it with Other Models
 
SKOS and Its Application in Transferring Traditional Thesauri into Networked KOS
SKOS and Its Application in Transferring Traditional Thesauri into Networked KOSSKOS and Its Application in Transferring Traditional Thesauri into Networked KOS
SKOS and Its Application in Transferring Traditional Thesauri into Networked KOS
 
Metadata for Terminology / KOS Resources
Metadata for Terminology / KOS ResourcesMetadata for Terminology / KOS Resources
Metadata for Terminology / KOS Resources
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology Registries
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 

Último

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 

Último (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 

Using a Semantic Analysis Tool to Generate Subject Access Points: A Study using Panofsky’s Theory and Two Research Samples

  • 1. Using a Semantic Analysis Tool to Generate Subject Access Points: A Study using Panofsky’s Theory and Two Research Samples Marcia Lei Zeng Karen F. Gracy, Kent State University, USA & Maja Žumer University of Ljubljana, Slovenia 13th International Conference (ISKO 2014) Krakow, Poland,. May 19th-22nd 2014
  • 2. Background The Research Question: • the assessment of alternative approaches of generating subject access points to the materials that are usually not made available through regular library catalog routines. – Subject access is critical for cross-institutional digital libraries. – Limited subject access points are particularly critical with very large-scale resources of cross-institutional collections. – LAMs are recognizing the impracticality and impossibility of conducting exhaustive traditional subject analysis.
  • 3. Example of alternative: -- Natural language processing and semantic annotation software-suggested access points: • named entities • topics) • relations of the contents of a given resource
  • 4. Panofsky’s three-layer framework and the simplified layers used by CCO
  • 5. An example of Using a Semantic Analysis Tool to Generate Subject Access Points
  • 6. Finding Aids : Pearl Harbor Attack (Dec 6-Dec 8, 1941) Source: http://www.fdrlibrary.marist.edu/archives/pdfs/findingaids/findingaid_pearlharborattack.pdf
  • 8.
  • 9.
  • 10. Two research samples were used to analyze the access points supplied by OpenCalais semantic analysis tool
  • 11. Input • batch process • single file upload • copy-paste 1. Obtain text • call OpenCalais • perform entity extraction 2. Extract entities & tags • convert from JSON to CSV • clean up through OpenRefine 3. Convert & Clean up text Output structured data • 43 archival record groups • from sixteen institutions, including – university archives, – government records archives, and – manuscript/special collections repositories in various LAMs. • Text from the archival finding aids – Descriptive information • creator histories • scope and content notes • detailed description of contents, including folder and item titles – Abstracts from these descriptions The Process (Mainly automatic) Sample 1: Finding Aids
  • 12. Original Panofsky Object of Interpretation Level 1 -Primary or natural subject matter (A) factual, (B) expressional-, constituting the world of artistic motifs Level 2 -Secondary or conventional subject matter, constituting the world of images, stories and allegories. CCO Simplified layers 1 - Description (refer to the generic elements depicted in or by the work). 2 - Identification (refer to the specific subject). ofness ofness & aboutness & (limited from ofness aspects)
  • 13. Entities correctly identified via Calais analysis (at level one, or, description) included: – personal names (Person), – corporate names (Company, Facility, Organization), – geographic names (City, Continent, Country, Natural Feature, ProvinceOrState, Region), and – events (Holiday, PoliticalEvent). Calais provides relevance scores for each identified entity, which may be used as a valuable clue about the importance of that entity to the overall scope of the archival collection. Findings
  • 14. In addition to entities, Calais also generated many topical terms describing the subject matter of the records (level two, or, identification) • These topics were often found – as social tags or – as entities under the “IndustryTerm” – or as entities “Product” category. These categorizations were the least reliable in terms of accuracy: • incorrectly identified text strings from the finding aids as products or industry terms. • analysis of detailed description areas was most likely to lead to incorrect identification of text strings because the descriptions have the physical location information intermingled. • Reason: the raw data (unedited) that was fed to the engine: – the entire finding aid was used – often included physical location information for the records and document formatting Targeted analysis of particular areas of the finding aids may result in better accuracy for topical analysis. Findings
  • 15. Suggestions based on Sample 1 (Finding Aids) • It would be well worth the effort for institutions to experiment with semantic analysis methods as – an initial step to suggest key entities and topics, or – as a final check to ensure that important concepts or entities have not been overlooked. • For certain types of records, particularly those for which subject indexing is not common, semantic analysis may provide entry points to archival records that were not previously available. • Such techniques will enhance subject analysis at the first two levels (description and identification), but are unlikely to be useful for interpretation of the material.
  • 16. Sample 2 • 44 philosophy theses – a selected sub-sample (22) from KentLINK; and – a random sample (22) from OhioLINK. • Abstracts, • titles, • keywords, and • introduction paragraphs • Process (manual) 1. Submitted to OpenCalais separately to obtain the results. 2. All of the candidate terms were counted according to Agent Names, Geographic Names, Corporate Name, and Topic Terms. 3. They were manually validated to determine 1. the relevance to the thesis, 2. the type of a term (e.g., named entity, tag, or general heading), 3. its availability in 1. LCNAF, 2. LCSH, 3. Wikipedia (as an entry), and 4. the Stanford Encyclopedia of Philosophy.
  • 17. Original Panofsky Object of Interpretation Level 1 -Primary or natural subject matter (A) factual, (B) expressional-, constituting the world of artistic motifs Level 2 -Secondary or conventional subject matter, constituting the world of images, stories and allegories. Additional level: - Inferencing CCO Simplified layers 1 - Description (refer to the generic elements depicted in or by the work). 2 - Identification (refer to the specific subject). ofness ofness & aboutness & (limited from ofness aspects) aboutness & (very general)
  • 18. Research Findings from Sample 2 • Semantic analysis based on the abstracts generated more successful tags than those based on the titles. • Some entity names missed in the Entity section were often correctly extracted into the tags section – E.g., singular names such as Plato and Aristotle, or – instances where the first name was not included • Major concepts were correctly identified in most cases.
  • 19. • The software often over-generalized the subjects by assigning very general terms (e.g., “philosophy,” for almost every philosophy thesis) and • Some terms that were unrelated to the subject of the thesis. • This level is different from “identification” and “description”, seems to be “inferencing”. Research Findings from Sample 2 (cont.) KentLINK sub-sample: • average 9 tags per abstract, • an average of 1.64 were overly broad topical terms and • an average of 3.45 were unrelated topical terms (slightly more than 1/3). OhioLINK sub-sample had similar figures.
  • 20. Suggestions based on Sample 2 • Level I “description” -- the tags did very well • Level II “identification” – adequate • The tags that could be categorized as “inferencing” results seemed to be less valid according to the best practices of cataloging and subject indexing. – The overly-broad topic terms are not wrong (e.g., philosophy, knowledge, science) but their relevance in terms of subject access is questionable. • The promising news: among the topical terms (including named entities as topics), – LCSH together with LCNAF could match about 75% of them closely (we used the degree as closeMatch, in comparison to broadMatch, narrowMatch or noMatch), – DBpedia matches almost 98% with closeMatch degree for both sub-samples. • These vocabulary sources hold great potential for these subject access points to become the linking point to the Linked Data datasets that use DBpedia and LC vocabulary URIs as their basis.
  • 21. Future Research Most helpful, good, high exhaustivity adequate, (depending on the domain and raw data) I. description (ofness) II. identification (aboutness & ofness) maybe useful inferencing (aboutness) Not applicable III. interpretation (aboutness) Need User Studies
  • 22. Acknowledgement • The authors would like to thank research assistants Sammy Davidson and Laurence Skirvin of Kent State University for assisting with OpenCalais-related processes. • The research is a sub-project of Metadata- Junctions, a project funded by IMLS National Leadership Grant, 2011-2013.