2. Do we even need journal
metadata?
“What we want is articles,” said Gorman,
calling the idea of putting them together in
things called journals “irrelevant.”
Tenopir, Carol. “The Value of the Container.” Library
Journal 131, no. 2 (2/1/2006 2006): 32–32.
3. Consider Discoverability
“If you're using repository or journal management
software, such as Eprints, DSpace, Digital Commons
or OJS, please configure it to export bibliographic data
in HTML "<meta>" tags. Google Scholar supports
Highwire Press tags (e.g., citation_title), Eprints tags
(e.g., eprints.title), BE Press tags (e.g.,
bepress_citation_title), and PRISM tags (e.g.,
prism.title). Use Dublin Core tags (e.g., DC.title) as a
last resort - they work poorly for journal papers
because Dublin Core doesn't have unambiguous
fields for journal title, volume, issue, and page
numbers.”
Google Scholar Indexing Guidelines,
https://scholar.google.com/intl/en-
us/scholar/inclusion.html#indexing .
4. Example Article Citation
Elements
Chicago Manual of
Style
Article Title
Article Author
Journal Title
Journal Date
Journal Volume
Journal Issue
Page Range
Journal Article Tag
Suite (JATS)
<article>
<article-meta>
<journal-meta>
<contrib>
<ref-list>
(Peroni, Lapeyre, and
Shotton, 2012)
5. OpenDOAR Directory for IRs
Includes description, policies summary,
software platform, OAI-PMH availability, and
size
Statistics for repositories includes location,
frequent languages, frequent content types,
metadata and data re-use policies, and
content, submission and preservation policies
About 85% of repositories represented have
unknown, unstated, or undefined metadata re-
use policies
10. University of Michigan
Characteristics
DCTERMS.bibliographicCitation can refer to
pre-print or publisher’s PDF
DC.type indicates the genre is article
DC.date.issued is year of publication
12. University of Queensland
Characteristics
Include journal title, volume, issue, start page,
end page and date, plus ISSN – Highwire
Press tags
Sub-type for article not contained in <meta>
tags with other Dublin Core elements, but in
<body>
Now has Open Access Mandate Compliance
field
14. Columbia University
Characteristics
Includes Publisher and CU DOIs
Includes journal title, volume, issue, start page,
end page and date – Highwire Press tags
Uses MODS metadata schema, but not in the
<meta> tags
17. eLIS Characteristics
eprints.type and dc.type to indicate preprint or
journal article
eprints tags includes publication title, volume,
issue number and date range
Identifier examples include eprints.issn,
eprints.id_number, eprints.official_url, and
dc.identifier
19. University of Nebraska Lincoln
Characteristics
Uses bepress_citation tags – author, title and
date
The citation information for the journal is
contained in <body>
PDFs appear to be formatted according to
Google Scholar inclusion guidelines
23. UPEI Characteristics
Highwire Press tags for journal citation, except
for citation_lastpage
Additional Dublin Core elements - DC.isPartOf
also used for journal title, DC.type for Journal
Article, and DC.identifier used for PMID
pre-print status appears in record display
26. Starting a Data Dictionary
Identifier – ISSN (ISSN:1612-9768)
Identifier – DOI (URI)
Relation-IsPartOf – journal title
Identifier-BibliographicCitation – full citation
Type - “Journal Article” :
http://www.ukoln.ac.uk/repositories/digirep/index/Ep
rints_Type_Vocabulary_Encoding_Scheme#Journal
Article
Type - “text” : DCMI
27. Developing Good Practices
Try some tools to practice with Dublin Core
metadata -
http://www.dublincoregenerator.com/generator.h
tml
Examples of useful documentation for our
library include UIC Data Dictionary for
CONTENTdm, Best Practices for
CONTENTdm and Other OAI-PMH Compliant
Repositories
Examples directly related to journal articles
can be scattered across many data
dictionaries, best practices, and other
28. Use Case – Green OA
“About 50% journal articles published during
the past 12 months are freely available on the
Internet. Nearly half of those OA articles are
Green OA. There are millions of them on IRs,
traditional journal Web sites, authors’ social
network sites, and other Web sites.”
Xiaotian Chen, “Open Access Articles Reaching 50% But
Their Retrieval is Lagging,” CARLI Annual Meeting,
2014.
29. Distinguishing Article Versions
MIT metadata indicating publisher’s PDF
Example record: http://hdl.handle.net/1721.1/92550
dc.eprint.version – Final published version
dc.relation.isversionof -
http://dx.doi.org/10.1038/srep07467
30. Use Case – Zotero Integration and
IRs
CoinS – recognizes genre as article, but can
be missing key citation elements
Embedded Metadata – often detects journal
articles as web pages
DOI – can record publisher’s URL, rather than
article version present in IR
Retrieve Metadata for PDF – only works if
article is indexed in Google Scholar
31. Use Case – Open URL Link
Resolver
SFX links to Google Scholar via
getWebSearch, which is a citation title search
Could link resolver link to IRs individually, or,
more likely, a collection of IR metadata, such
as OpenDOAR?
33. Final Considerations
Start with the specific use cases for your own
institution
Evaluate your policies in light of OpenDOAR
policy guidelines
Don’t be afraid to share your metadata and
your documentation
Editor's Notes
The first place we look for a repository's policies is its OAI-PMH Identify response (e.g. for Nottingham EPrints - http://eprints.nottingham.ac.uk/perl/oai2?verb=Identify). This often includes standard sections for policies. Alternatively, we look for a relevant web page in the repository - a special 'Policies' page or an 'About' page. We then analyse the policies using standard criteria, an assign a grade for each policy.
If we are unable to find any policy information at all, the status is set to 'Unknown'. If there is information on policies, but the particular policy is not covered, the status is set to 'Unstated'. In some cases, there may be a slot for the relevant policy, but all it says is 'not yet defined'. In these cases we set the status to 'Undefined’.
Sorted the OpenDoar list of repositories to find the largest, and then picked examples from particular platforms.
University of Michigan Deep Blue example – has some special fields added to DSpace? Zotero detects this metadata as Embedded Metadata. The OAI-PMH URL is listed in OpenDOAR. DC.type “Article” and Highwire Press tags in the HTML <head> element are basically DSpace out of the box; i.e., you don’t have to customize these, but Dspace already has the SEO features built in
The University of Queensland eSpace IR is able to add additional information about articles through its scoping feature – this includes a limiter for journal articles and also for Scopus article. Where does this information come from?
Columbia University has a Fedora repository with a MODS schema, right? They support OAI-PMH - http://academiccommons.columbia.edu/catalog/oai?verb=Identify. Appears to be MODS only, or is this Highwire Press? The save to Zotero is DOI.
MODS metadata plus additional <meta> tags
Includes Publisher and CU DOIs
MODS metadata plus additional <meta> tags
Includes Publisher and CU DOIs
eLIS and Google Scholar – could not retrieve metadata via Zotero for PDF I found in Google Scholar. Saves to Zotero as Embedded Metadata, but as a web page.
University of Nebraska has BePress repository, which uses BePress tags. BePress can be shared through OCLC(?), and is working with Google? Can’t see Citation or DOI in metatags, but they do appear on main screen display; no volume, issue, page range or DOI saves to Zotero. Saves as article in Zotero as CoinS.
This example has almost all of the expected elements of traditional journal citation. This example saves to Zotero as a DOI. Look for ISSNs in all my examples. Bielefeld is on the Linking Open Data (LOD) Cloud diagram, but their OpenDOAR listing that their metadata re-use policy is explicitly undefined. According to linked data model, however, they have published the dataset themselves, rather than relying on others to harvest it themselves, perhaps? Two different approaches – active to publish, passive to let it be harvested? Saves to Zotero as DOI.
This example also saves to Zotero as DOI; includes full citation, DOI, ISSN, cross ref, and link to wiley’s website in the Zotero citation. One of the DC.identifier’s is the PubMed ID, which links to the pubmed article in the display.
Example of local adaptations that people may make – these are the adaptations I looked to make with ContentDM
Are we satisfied with the lack of best practices? Is it sufficient to use DC, MODS, BePress, Highwire tags as needed for our own particular context? What would we do if we shared the data more broadly? Is linking on citation title enough? Focused on examples that were the most extensive – many data dictionaries focus on controlled vocabulary, which is primarily subject terms, but can be other things(?). CSU also focused on authority control for names. CSU does include a prefix for their internal identifier schema, which includes a prefix for articles as genre; the purpose of this is statistical purposes (p.24).
Compare how Zotero finds article citation information – how does it break out all the needed citation elements from an IR? Zotero is also dependent on Google Scholar to do some of the work for PDFs – essentially, if it can be found in Google Scholar, and you save a copy of the article from the IR, you should be able to use the Retreive Metadata for PDF feature, but this doesn’t always work, for instance with Kousha and Thelwall article, even though it is in eprints – I found this article searching Google, not the IR interface that it is in, so it is important for the PDF to have embedded metadata, since I did not end up on the landing page in the IR for this article.
Eprints.rclis.org has embedded metadata that was inconsistent; it was able to identify a conference paper, but a journal article was identified by Zotero as a web page – the de robbio article from example slide. The de robbio article captured the journal title as the website title, and website type as journal article.
Give demo of adding OpenDOAR to SFX as discoverability.
You can link to Google Scholar from citation to this document in an IR from WorldCat, but it is using the getWebSearch link in SFX, which is a keyword title search. Remember, the basis of Google Scholar’s links is the citation title, not the journal title. The link is available in WorldCat because of Esevier’s partnership with OCLC.