Presentation slide for this:
Kei Kurakawa, Toward universal information access on the digital object cloud, In book of abstracts of International Workshop on Data Science - Present & Future of Open Data & Open Science -, p.57-59, November 12-15, 2018, Mishima Citizens Cultural Hall & Joint Support-Center for Data Science Research, Mishima, Shizuoka, Japan
Toward universal information access on the digital object cloud
1. Toward universal information
access on the digital object cloud
Kei Kurakawa
National Institute of Informatics
orcid.org/0000-0002-7031-1846
1
Presentation slide for this:
Kei Kurakawa, Toward universal information access on the digital object cloud, In book of abstracts of International Workshop on Data Science -
Present & Future of Open Data & Open Science -, p.57-59, November 12-15, 2018, Mishima Citizens Cultural Hall & Joint Support-Center for Data
Science Research, Mishima, Shizuoka, Japan
2. Research Data Alliance
• Founded in 2013
• Motto
– Research data sharing without barriers
• Participants
– Domain scientist, Information specialist, disciplinary data manager, curator, engineer, librarian,
software engineer, policy maker, etc.
• Plenaries
– RDA1 Gothenburg, Sweden, March 2013
– RDA2 Washington DC, US, September 2013
– RDA3 Dublin, Ireland, March 2014
– RDA4 Amsterdam, the Netherlands, September 2014
– *RDA5 San Diego, US, March 2015
– *RDA6 Paris, France, September 2015
– *RDA7 Tokyo, Japan, March 2016
– *RDA8 Denver, US, September 2016
– *RDA9 Barcelona, Spain, April 2017
– RDA10 Montreal, Canada, September 2017
– *RDA11 Berlin, Germany, March 2018
– RDA12 Gaborone, Botswana, November 2018
2
* indicates that I participated in it.
3. Outline
• Chronological overview of universal
information access
• Professional data community
• Digital object cloud (DOC) and linked data (LD)
• Persistent identifiers (PID)
• PID centric approach data management and
access
• Conclusions and future work
3
4. Universal information access
• The quest for universal information access in networks began around 1960
and over the years yielded a set of principles to fully support universal
information access. [Denning and Kahn, 2010]
– Memex (Vannevar Bush, “As we may think” Atlantic Monthly, 1945)
• The first visionary speculation. It stored documents on microfilm and allowed
annotations and cross links.
– Xanadu (Ted Nelson, early 1960s)
• It introduced topics such as hypertext, hyperlinks, automatic version management,
automatic inclusion of referenced items, and small payments to authors for use of their
materials.
– NLS (Doug Engelbart, middle 1960s)
• It is the first working hypertext system with graphical user interface, mouse, and
collaboration tools.
– World Wide Web (Tim Berners-Lee, late 1980s)
• It is a potential means to Implement Bush’s, Nelson’s and Engelbart’s ideas of knowledge
representation in the Internet
– Digital Object Architecture (DOA) (CNRI, late 1980s) [Kahn and Wilensky, 1995]
• It shows key principles on information access, culled out and unified from digital library
projects, in a network environment (the Internet)
Denning, P. J., & Kahn, R. E. (2010). The long quest for universal information access. Communications of the ACM, 53(12), 34. http://doi.org/10.1145/1859204.1859218
4
6. Premises in the professional data
community
• Computational data format should not be
complicated, ever lasting, and independent on
computer technology changes.
• Data scheme and data attributes are
complicated at some professional levels.
• Only the professionals can deal with data
processing and management.
• Of course, the professionals have good
knowledge of the domain.
6
8. Practices in the professional data
community
8
Data Fabric IG, Group details, https://www.rd-alliance.org/group/data-fabric-ig.html
The data cycle is based on the multi-disciplinary survey on the nature, the creation and the
usage of Persistent Identifiers (PIDs)
Peter Wittenburg, Margareta Hellström, and Carlo-Maria Zwölf (eds.), Hossein Abroshan, Ari Asmi, Giuseppe Di Bernardo, Danielle Couvreur, Tamas
Gaizer, Petr Holub, Rob Hooft, Ingemar Häggström, Manfred Kohler, Dimitris Koureas, Wolfgang Kuchinke, Luciano Milanesi, Joseph Padfield, Antonio
Rosato, Christine Staiger, Dieter van Uytvanck and Tobias Weigel (2017): Persistent identifiers: Consolidated assertions. DOI: 10.15497/RDA00027.
9. Digital Object Architecture (DOA)
[Kahn and Wilensky, 1995]
• Digital object (DO)
– Any unit of information represented in digital form may be structured as a digital object within
the Internet.
– The structure of a DO, including metadata, is machine and platform independent.
• A unique, persistent identifier (called a “handle”)
– Every DO has a unique identifier that can distinguish a DO from every other object, present,
past, or future.
• Handle System
– The “resolution” system maps handles to state information that includes location,
authentication, rights specifications, allowed operations, and object attributes.
• DO repositories
– DOs can be stored in DO Repositories, which are searchable systems.
– Accesses to an instance of DO Repository are made via a standard DO protocol (DOP) that
restrict actions to those.
• DO registries
– They allow users to reference, federate, and otherwise manage collections across multiple
repositories and allow for full access control.
9
Kahn, R. E. and Wilensky, R. A framework for distributed digital object services. International Journal on Digital Libraries 6, 2 (2006). DOI: 10.1007/s00799-
005-0128-x. (First made available on the Internet in 1995 and reprinted in 2006 as part of a collection of seminal papers on digital libraries).
10. Digital object cloud
10
Larry Lannom, Peter Wittenburg, Global Digital Object Cloud (DOC) - A Guiding Vision, 11 September 2016,
http://hdl.handle.net/11304/a8877a1a-9010-428f-b2ce-5863cec4aff3
11. Linked data
11
Linked Data - Connect Distributed Data across the Web
http://linkeddata.org https://www.w3.org/2007/03/layerCake.png
Semantic technology layer cake
12. Mixture of DOC and LD on the Internet
information space
Persistent identifiers is a key to bridge the gap between
the digital object cloud (DOC) and the linked data (LD)
Digital object cloud
Linked data
The Internet
A node represents a resource with a persistent identifier.
12
13. Varieties of academic persistent
identifiers and management systems
13
Handle System
ORCID
DOI
Digital object (DO), Research data, Research sample, Research instrument?,
Concept, Taxonomy?, Classification
Researcher
Organization
CrossRef, DataCite, etc
Grant
Federated
Identity
Management
eduPersonOrgOrcid eduPersonOrgDN
ISNI?
GRID?
Ringgold?
PID (persistent identifiers) entity type
Research resource
ARKPURL
URI / URN
Meta-resolver / Handle, DOI, ARK, PURL resolver
Publisher articles / figures,
Data citation,
IGSN, etc
OrgID?、Project?
ISBN,
LSID,
ChEBI,
Perma.cc,
etc
ePIC,
etc
CrossRef Funder?
14. Data consumer scenario
14
Data discovery
&
Automatic data processing
Dynamic data citation
Data fabric
PID (Persistent Identifier)
Data typing
Data versioning
Data provenance
Data collection
Data trustworthy
On the Global Digital Object Cloud
15. Google dataset search (Beta)
15
It was released on 2018-09-05.
Data discovery paradigm IG of RDA discussed with Dr. Natasha Noy from Research at
Google in Nov of 2017.
https://www.blog.google/products/search/making-it-easier-discover-datasets/
http://g.co/datasetsearch
Data providers are expected to prepare a descriptive metadata of
schema.org for the site to be discoverable.
16. PID centric approach to data
management and access
16
Data type registries Kernel information on Local Handle Service
Broeder, D., & Lannom, L. (2014). Data Type Registries: A Research Data
Alliance Working Group. D-Lib Magazine, 20(1/2).
http://doi.org/10.1045/january2014-broeder
Tobias Weigel, Beth Plale, Mark Parsons, Gabriel Zhou, Yu Luo, Ulrich
Schwardmann, Robert Quick, Margareta Hellström, Kei Kurakawa, “RDA
Recommendation on PID Kernel Information (Draft)”, https://www.rd-
alliance.org/sites/default/files/RDA%20Recommendation%20on%20PID%20K
ernel%20Information.pdf
17. Data providing with data types
• In parallel, we need data typing.
• Data providing maturity levels
– Level 1
• Data providers build their data in a community standard
– The data is packed in a commonly used format, i.e. XML, JSON, netCDF, CSV as
well as application dependent such as Microsoft EXCEL format.
– Some data are shipped with a document describing data meaning, data types,
and data format.
– Level 2
• Data providers use more complicated data format to assert data types
– A set of Handle server of DOI objects with Kernel Information profile and Data
Type Registry is a recommended candidate for a variety of domain community
to assert their data types in addition to their data sources in a community
standard format.
– On the other hand, linked data community uses RDF/XML, JSON-LD and other
linked data formats, or a kind of mixture format of data type and value.
– Common vocabularies are provided in a public server, e.g. schema.org.
17
18. On the Digital Object Cloud
• Kernel information connects data and data types.
• We need to handle with a graph structure of the data.
18
Attribute augmented graph
Data layer
Data type layer
Kernel Information metadata layer
Kei Kurakawa, Takayuki Sekiya, Yasumasa Baba, Making data typing efforts or automatically detecting data types
for automatic data processing?, Research Data Alliance 11th Plenary Meeting, Berlin, Germany, 2018.03.21-23
https://www.rd-alliance.org/sites/default/files/rda11_poster_20180321_kurakawa.pdf
19. Conclusions and future work
• Two types of accessing data on the Internet
– Digital object cloud (DOC)
– Linked data (LD)
• PID centric approach data management and access
– Data consumer scenario
• Data search
• Automatic data processing
• We need more functional research on
– data typing,
– handling with a graph structure of data resources,
– case studies
– on the digital object cloud.
19