What is Text and Data Mining (TDM)? How does it fit with Open Access and Open Science? Why librarians, information professionals and research support administrators should care about it? This webinar describes TDM, explains the importance of machine accessibility to Open Access content and describes how open content can be used for TDM purposes. It also provides examples for TDM readings and courses designed for those who work in libraries or research offices providing research support.
Links:
• slide 5 —> https://blog.core.ac.uk/2015/10/19/7-tips-for-successful-harvesting
• slide 6 —> https://core.ac.uk/services
• slide 11 —> http://blogs.lse.ac.uk/impactofsocialsciences/2018/03/22/releasing-1-8-million-open-access-publications-from-publisher-systems-for-text-and-data-mining/
• slide 13 —> https://www.fosteropenscience.eu/openminted
• slide 16 —> https://www.fosteropenscience.eu/node/2263
1. Infographic
Access the connector:
http://publisher-connector.core.ac.uk/resourcesync
Discovery services:
Proprietary APIs
Connector layer
frontiers
Crossref
COREPublisher
Connector
PubMedOAsubset
arXiv
Dataset
Numberofresources
492,462 59,512 172,812
1,831,877
Open Access articles seamlessly
accessible by everyone
7%
of the total content available
from the above publishers is
Open Access
Every record contains
metadata and full text
All resources are accessible
via ResourceSync
and more publishers
on the way...
Every resource is
automatically synchronised
across all clients
The largest datasets for text mining
Gold Open Access
- arXiv: 1,261,533
- PubMed Central (OA subset):1,582,188
- CORE Publisher Connector:1,660,625
For the largest collection of Green &
Gold Open Access content, look at
https://core.ac.uk/services#dataset
pdf
Title
Authors
Publisher
DOI
...
Master copy through
ResourceSync sitemaps
Synchronised copy
automated
synchronisation
immediate
propagation
of deletion
1,107,091
2. Presentation of the expertise directory
Knoth, P., Anastasiou, L., Pearce, S. and Pontika, M. (2018) Towards a Global Comprehensive Dataset of Open
Access Papers for Text Analytics, Open Repositories 2018, Bozeman, Montana
FORCE2017 Conference – workshop on
”Improve interoperability across publisher platforms to support text
and data mining” – 33 publishers attended
3. Dataset statistics
Source type Details Number of open access
articles
Repositories and full OA
publishers (OpenAIRE
and CORE)
3,667 data sources
globally harvested using
OAI-PMH
9,033,808
CORE Publisher
Connector
Elsevier 1,191,785
Springer 540,889
Frontiers 65,927
PLoS 179,571
Total publisher
connector
1,978,172
Total Dataset 11,011,980
Knoth, P., Anastasiou, L., Pearce, S. and Pontika, M. (2018) Towards a Global Comprehensive Dataset of Open
Access Papers for Text Analytics, Open Repositories 2018, Bozeman, Montana
4. Promotion of the expertise directory
Knoth, P., Pontika, N., Anastasiou, L. Releasing 1.8 million open access publications from publisher systems for text and data mining, LSE Blog http://blogs.lse.ac.uk/im
pactofsocialsciences/2018/03/22/releasing-1-8-million-open-access-publications-from-publisher-systems-for-text-and-data-mining/
5. • Established and maintain a close collaboration with
researchers
• Extensive experience in advocacy, i.e. open access
• Knowledgeable about the repository’s collection
• Participate in the Academic Institution’s Research
Committees
• Knowledgeable of your repository’s collection
• Familiarity with Copyright issues and Creative Commons Licen
ses
TDM & Research Support Staff
6. Where to find TDM
related material - I
3 TDM taxonomies developed
by the project:
• Text and Data Mining
• TDM Methods
• TDM workflows
OMTD tutorials and courses
url : https://www.fosteropenscience.eu/openminted
7. Where to find TDM
related material - II
Educational training videos
introducing TDM concepts
Other TDM training materials
9. Introduction to TDM
course - I
Created by OU and LIBER in c
ollaboration with Cambridge
University.
• First technical TDM course
addressed to research
support staff.
• Presents OMTD and guides
how to use it.
• Hands-on examples on
basic TDM processes