Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Integration of research literature and data (InFoLiS)
1. Integration of research literature and data
(InFoLiS)
Katarina Boland1
Philipp Zumstein2
1
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
2
Mannheim University Library, Mannheim, Germany
CNI 2015 Spring Membership Meeting
April 14th, 2015
2. the InFoLiS project:
Integration of research data and publications
InFoLiS I: 05/2011 - 05/2013
InFoLiS II: 08/2014 - 08/2016
InFoLiS is funded by the DFG (SU 647/2-1)
Integration of research literature and data (InFoLiS) 2/22
Introduction
3. Catalogue:
Publications
SSOAR (GESIS),
Primo (UB MA),
...
DataCatalogue:
Research Data
da|ra (GESIS),
...
Query
Query
Response
Links
Response
Response
Response
Integration of research literature and data (InFoLiS) 3/22
InFoLiS Project Goals
4. 1 Part I: Generation of Links
2 Part II: How can you reuse it?
Integration of research literature and data (InFoLiS) 4/22
Outline
6. Recommendation:1
:
Creator (Publication Date): Title. Publication
Agent. Identifier
Creator (Publication Date): Title. Version.
Publication Agent. Type of Resource. Identifier.
→ Extraction based on these patterns?
1
see
http://auffinden-zitieren-dokumentieren.de/zitieren/empfohlene-datenzitation/
Integration of research literature and data (InFoLiS) 6/22
Citation of Research Data
7. presentation and discussion of the empirical findings. For this purpose, data
from the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are used
and for both periods, the impact factors are estimated using linear regression
models.
data from the title of the years year are used
Integration of research literature and data (InFoLiS) 7/22
References to Datasets
8. Table 1: Population forecast for Germany depending on age cohorts -
proportion in percent.
Data base: 10th Population Forecast of the Federal Statistical Office , variant 5.
(Data base: number title of the publication agent, variant
variant)
Integration of research literature and data (InFoLiS) 8/22
References to Datasets
9. Consulted were furthermore ...
Consulted were furthermore title1, title2, title3, ..., titleN.
Integration of research literature and data (InFoLiS) 9/22
References to Datasets
10. Table 3: Sample of the surveys conducted in the years 2003 and 2004 as well
as size of the sample, with valid data from both surveys
(Source: Ditton et al. 2005a)
(Source: citation of descriptive publication)
Integration of research literature and data (InFoLiS) 10/22
References to Datasets
11. ...are hard to detect!
see also...
Green, Toby (2009). We Need Publishing Standards for
Datasets and Data Tables. OECD Publishing White Paper.
doi: 10.1787/603233448430
Altman, Micah and Gary King (2007). A Proposed Standard
for the Scholarly Citation of Quantitative Data. In: D-Lib
Magazine 13.3.
url: http://www.dlib.org/dlib/march07/altman/03altman.html
Integration of research literature and data (InFoLiS) 11/22
References to Datasets
12. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
13. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage der
Sozialwissenschaften/German General Social Survey 1996)”
14. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage
der Sozialwissenschaften/German General Social Survey 1996)”
“ALLBUS 96”
15. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“Youth 2010”
16. How do humans recognize study references?
Source: Estimations based on SOEP, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
17. How do humans recognize study references?
Source: Estimations based on xyz, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
19. for details see...
Katarina Boland, Dominique Ritze, Kai Eckert & Brigitte Mathiak (2012).
Identifying References to Datasets in Publications. In: Proceedings of the
Second International Conference on Theory and Practice of Digital Libraries
(TPDL), Lecture Notes in Computer Science Volume 7489, pp. 150-161. Berlin:
Springer. doi:10.1007/978-3-642-33290-6 17
Integration of research literature and data (InFoLiS) 15/22
Reference Extraction
21. Strategies: 1) greedy; 2) exact; 3) best
Integration of research literature and data (InFoLiS) 17/22
Mapping to Datasets in da|ra:
granularity of registration vs. citation
22. ALLBUS
ALLBUS 2000 ALLBUS 1996ALLBUS 1998
ALLBUS 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI
ALLBUS - Cumulation 1980-2006 ALLBUS - Cumulation 1980-2008ALLBUScompact - Cumulation 1980-2010
ALLBUScompact 2000 ... ... ...
......
... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ......
ALLBUScompact
→ use ontology
Integration of research literature and data (InFoLiS) 18/22
Mapping to Datasets in da|ra
23. Vocabulary: e.g. DDI-RDF Discovery Vocabulary2
2
Thomas Bosch, Richard Cyganiak, Arofan Gregory, Joachim Wackerow (2013): DDI-RDF Discovery Vocabulary: A Metadata
Vocabulary for Documenting Research and Survey Data. In: Proceedings of the 6th Linked Data on the Web (LDOW) Workshop at
the 22nd International World Wide Web Conference (WWW). CEUR Workshop Proceedings, pp. 46-55
Integration of research literature and data (InFoLiS) 19/22
Ontology: Approach
26. Thank you for your attention!
katarina.boland@gesis.org
Integration of research literature and data (InFoLiS) 22/22
Next part: How can you
reuse it?
33. (Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which other study
titles are found with
the new
configuration of the
algorithm?
How was a
pattern derived?
Which studies
are found in an
document?
36. RESTful API (web services)
GET, POST, PUT, DELETE, PATCH resources
Search, perform algorithms, upload files
open for integration into other workflows, e.g. in
ressource discovery systems
research data catalogues
digital repositories
possible to orchestrate over a web interface for
individual use
41. Quoting the Horizon Report 2014
“Visionary leadership for research data management
models is also required to determine how to best
incorporate data connections into library catalogs” (NMC
Horizon Report 2014 - Library Edition, p. 7)
42. Current situation: Several steps needed
Common situation today:
Search online catalogue
Evaluate search results
Find fulltext to relevant source
Read the publication
Spot the research data
Moreover, often the reverse information is missing
completely
Which publications are built on some specific
research data?
43. Clientside
load additional data in
catalogue view (e.g. over
Ajax)
enrich view, links
up-to-date data
Embedd data in the web
presentation
Serverside
add additional data in your
catalogue database (e.g.
Primo enrichement process)
enrich view, links, search,
sort, filter
time-lagged because of
the update mechanism
Do the data fit into
existing infrastructure?
(fields, tables, database)
Two Approaches
44. Integration as links
Link from catalogue entry ...
… to the corresponding research data
45. Integration as popup
Cited research data: 2
• ALLBUS 2010 (used in 512 publications)
• part of ALLBUS (used in 13.456 publications)
• own research data (used in 1 publications)
48. Enrich your research data catalogue
Cited in: Ritze, D., Paulheim, H., &
Eckert, K. (2013). Evaluation Measures
for Ontology Matchers in Supervised
Matching Scenarios. In The Semantic
Web – ISWC 2013 (p. 392–407).
Tags from Publication: Supervised
Ontology Matching, Evaluation, Recall,
Precision, F-Measure, Precision@N-
Curves, ROC-Curves, Precision-Recall-
Curves
49. Current Goals of the Project
1. Expansion to other disciplines and languages
2. Linked data based infrastructure
3. Improve the reusability of generated links
50. Dissemination
our web services will be open for everyone
project webpage
http://infolis.github.io/
background information,
slides, publications, news
Additionally our code is open source
https://github.com/infolis
you can install/try out everything locally
development of code
51. Questions, Discussions, Feedback
Questions?
Discussions
Give us feedback
Small online survey: http://t1p.de/infolis
http://wiki.bib.uni-mannheim.de/limesurvey/index.php?sid=55594