SlideShare una empresa de Scribd logo
1 de 24
The Many and the One
             BCE problems in 21st c. data curation

Tracking it Back to the Source: Managing and Citing Research Data
                NISO Forum, Denver, Sept 24, 2012

                              Allen H. Renear
            Graduate School of Library and Information Science
                University of Illinois at Urbana-Champaign
                       Principal researchers of material presented:
      David Dubin, Karen M. Wickett, Simone Sacchi, Richard Urban, Allen H. Renear
               Center for Informatics Research in Science and Scholarship
                   Graduate School of Library and Information Science
                        University of Illinois at Urbana-Champaign


                        NSF/OCI-ITR DataNet Award #0830976
                         IMLS/LB Award #RE-05-08-0062-08
Problems, Problems, Problems
Identity problems:
   – Is this the data we think it is? Is it the same data as that data?
                        (involves issues of authenticity, integrity, encoding)
Meaning problems:
   – What is this data supposed to be telling us?
                       (involves interpreting the semantics of the data)
Relationship problems:
   – How is this data related to that data?
                       (involves issues of data provenance)
Integration problems:
   – How can I combine this data with other data?
                     (involves harmonizing conflicts at multiple levels)
Interoperation problems:
   – how can I get this data to work with my software?
                       (involves conversion to equivalent formats)
An issue underlying all these is representation…
       how do files of digital files represent facts about the world?
Identity Problems



Two scientists, Jill and John,
   used the same data.


   What does that mean?

    And how can well tell?
Identity Problems



         Compare:

Two scientists, Jill and John,
 used the same statistician.
Identity Problems



         Compare:

Two scientists, Jill and John,
 used the same centrifuge.
Identity and Representation Levels
Consider two files with the

    … same data,
                      but relational tables in one case
                           and RDF triples in another

    … with the same data and the same RDF triples,
                     but an XML serialization in one case,
                         an N3 serialization in another

    … with the same data, the same RDF triples, the same N3 serialization,
                     but UTF-8 character encoding in one case
                         and UTF-16 encoding in another

    How many of levels do we need? How do we define and manage them?
    How can they be identified and re-identified?
    Which identifier schemes for which level?
What is a dataset anyway?!


Maybe we should ask a scientist
    They’ll have an answer, right?




                                     6
There are almost as many answers as scientists




                                                 7
Cries from the heart

 “ the terms ‘Data Product’, ‘Data Set,’ and ‘Version’
     are overlaid with multiple meanings between
                     communities.”
                  (Barkstrom, 2009)
“There is ambiguity in what type of object a dataset is;
   with different groups of users applying different
                     connotations
   There needs to be an explicit statement of what
  the intended preservation of a dataset will imply.”
                    (Pepler, 2008)

                                                           8
Forcing us to conclude…

 No single object can possibly have all those attributes

Therefore it is impossible to give the common colloquial
         notion of dataset a precise definition

              It must instead be replaced
       by a family of new more specific concepts

                    Sound familiar?


                                                           9
FRBR




 10
A FRBR inspired solution

FRBR eliminates the ordinary “book” from our world
       The ordinary “book” can be simultaneously
              about chordata,
              in French,
              typeset in neo-Bauhaus,
              mustard-stained

but FRBR replaces the book with four objects
              the work              is about chordata,
              the expression         is in French,
              the manifestation      is typeset in neo-Bauhaus,
              the item              is mustard-stained
FRBR entities and attributes
   Work:              “an … intellectual or artistic creation”
   Expression:        “the … realization of a work … notation … etc.”
   Manifestation:     “the physical embodiment of an expression of a work”.
   Item:              “a single exemplar of a manifestation”

Attribute assignments characteristically disjoint
     A work may have a subject.
          It does not have a language, typeface, or condition.
     An expression may have a language;
          It does not have a subject.
               (or a typeface or a condition).
     A manifestation may have a typeface.
          It does not have a subject or a language
               (or a condition)
     An item may have a condition.
          It does not have a subject, language, or typeface.
                                           12
Entities? Really?




Aren’t some of those rectangles just nominalized relationships?



                              13
Ambiguities

Is

<object name="sample_31">
   <feature name="U22376" value="408" />
   <feature name="X59417" value="1784" />

An expression?

Is “00001011” an expression?




                                            14
FRBR Refactored

      Story

M:M
                Symbol
                Structure

       M:M                  Symbol
                            Structure

                   M:M
                                        Matter &
                                        Energy




                                                   15
FRBR refactored and applied to datasets
                                             All M:M           C1: observations

                      [Semantic Level]                                   expressed by…

                                                               S1: RDF triples

                                                                         encoded by…

                                                               S2: N3 statements
 [Syntax Level]                          [Encoding levels]               encoded by …

                                                               S3: Unicode characters

                                                                         encoded by…

                                                               S4: UTF-8 bit streams
Based on the Systematic
Assertion Model (SAM) for
                                                                         inscribed in…
modeling datasets, developed             Instantiation level
by David Dubin et al.
                                                               M1: RAID array state
Identifiers


What do we identify with identifiers?
 An entity?
       Content
       Symbol structures
       Patterned matter and energy
 A nominalized relationship?

How do we confirm identification?

                                        17
Identification

How do we identify an expression?
How do we identify an encoding?
How do we identify the data?

On the practical side we do this every day

On the theoretical side it is very difficult to usefully formalize.




                                                                      18
Identity and change problems in Planets
From the Planets Conceptual Data Model, Sharpe et al. (2006)




                                                               19
Identity and change problems in Planets


• A file is a bitstream
• A file can be modified
• But a bitstream cannot be modified.



Credits to Dave Dubin, Simone Sacchi, Karen Wickett. Data Concepts Group, Data
Conservancy (NSF/OCI-ITR DataNet Award #0830976)




                                                                                 20
Center for Informatics Research
                in Science and Scholarship (CIRSS)
               Graduate School of Library and Information Science
                  University of Illinois at Urbana-Champaign

Director: Carole L Palmer
Associate Director: Cathy Blake
    c. 12 affiliated GSLIS faculty; 8 Phd students.

CIRSS research groups:
    Data Practices:                      social science of information work
    Socio-Technical Data Analytics:      algorithms + people
    *Data Concepts:                      modeling for integration/computation

Professional Education:
Data curation specialization within an ALA-accredited LIS program
Other options are being planned

                                                                            21
CIRSS Data Concepts Group


Rationale
  Integration and interoperability requires robust formal conceptual
  models for scientific data
  Especially if semantic technologies are going to be exploited.
  Our current models aren’t good enough

Mission
  The data concepts group takes a logic-based approach to to
  solving conceptual modeling problems in scientific data curation
Questions?


This research is being carried out by the Data Concepts Group at the Center for
Research in Informatics and Scholarship (CIRSS) at the University of Illinois at Urbana-
Champaign,
Carole L. Palmer, Director.

                           Principal contributors include
    David Dubin, Karen M. Wickett, Simone Sacchi, Richard Urban, Allen H Renear


                      NSF/OCI-ITR DataNet Award #0830976
                       IMLS/LB Award #RE-05-08-0062-08

Más contenido relacionado

La actualidad más candente

Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrCarly Strasser
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Dimitrios Koureas
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesAnita de Waard
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwanandrea huang
 
HiTIME project
HiTIME projectHiTIME project
HiTIME projectvty
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Peter Conradie
 
Type inference through the analysis of Wikipedia links
Type inference through the analysis of Wikipedia linksType inference through the analysis of Wikipedia links
Type inference through the analysis of Wikipedia linksAndrea Nuzzolese
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Carlos Castillo (ChaTo)
 
RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkRDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkASIS&T
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSRobert Oostenveld
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationFuture Perfect 2012
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Databasetmra
 
CSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database ManagementCSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database ManagementYhal Htet Aung
 

La actualidad más candente (19)

Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Whither Small Data?
Whither Small Data?Whither Small Data?
Whither Small Data?
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific Repositories
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 
HiTIME project
HiTIME projectHiTIME project
HiTIME project
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...
 
Type inference through the analysis of Wikipedia links
Type inference through the analysis of Wikipedia linksType inference through the analysis of Wikipedia links
Type inference through the analysis of Wikipedia links
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
 
RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkRDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRS
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right Combination
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
CSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database ManagementCSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database Management
 
Data managementbasics issr_20130301
Data managementbasics issr_20130301Data managementbasics issr_20130301
Data managementbasics issr_20130301
 

Similar a Managing Research Data Identity and Relationships

Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...National Institute of Informatics (NII)
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpediaSamantha Lam
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
 
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...Riccardo Albertoni
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Improving search with neural ranking methods
Improving search with neural ranking methodsImproving search with neural ranking methods
Improving search with neural ranking methodsvoginip
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsAndre Freitas
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsAndre Freitas
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum
 
Genealogical domain
Genealogical domainGenealogical domain
Genealogical domainjcampany
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Gleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationGleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationKalpa Gunaratna
 

Similar a Managing Research Data Identity and Relationships (20)

Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...
 
STI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital WorldsSTI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital Worlds
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpedia
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Improving search with neural ranking methods
Improving search with neural ranking methodsImproving search with neural ranking methods
Improving search with neural ranking methods
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category Descriptors
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Genealogical domain
Genealogical domainGenealogical domain
Genealogical domain
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Gleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationGleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity Summarization
 

Más de National Information Standards Organization (NISO)

Más de National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Último

PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Último (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Managing Research Data Identity and Relationships

  • 1. The Many and the One BCE problems in 21st c. data curation Tracking it Back to the Source: Managing and Citing Research Data NISO Forum, Denver, Sept 24, 2012 Allen H. Renear Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Principal researchers of material presented: David Dubin, Karen M. Wickett, Simone Sacchi, Richard Urban, Allen H. Renear Center for Informatics Research in Science and Scholarship Graduate School of Library and Information Science University of Illinois at Urbana-Champaign NSF/OCI-ITR DataNet Award #0830976 IMLS/LB Award #RE-05-08-0062-08
  • 2. Problems, Problems, Problems Identity problems: – Is this the data we think it is? Is it the same data as that data? (involves issues of authenticity, integrity, encoding) Meaning problems: – What is this data supposed to be telling us? (involves interpreting the semantics of the data) Relationship problems: – How is this data related to that data? (involves issues of data provenance) Integration problems: – How can I combine this data with other data? (involves harmonizing conflicts at multiple levels) Interoperation problems: – how can I get this data to work with my software? (involves conversion to equivalent formats) An issue underlying all these is representation… how do files of digital files represent facts about the world?
  • 3. Identity Problems Two scientists, Jill and John, used the same data. What does that mean? And how can well tell?
  • 4. Identity Problems Compare: Two scientists, Jill and John, used the same statistician.
  • 5. Identity Problems Compare: Two scientists, Jill and John, used the same centrifuge.
  • 6. Identity and Representation Levels Consider two files with the … same data, but relational tables in one case and RDF triples in another … with the same data and the same RDF triples, but an XML serialization in one case, an N3 serialization in another … with the same data, the same RDF triples, the same N3 serialization, but UTF-8 character encoding in one case and UTF-16 encoding in another How many of levels do we need? How do we define and manage them? How can they be identified and re-identified? Which identifier schemes for which level?
  • 7. What is a dataset anyway?! Maybe we should ask a scientist They’ll have an answer, right? 6
  • 8. There are almost as many answers as scientists 7
  • 9. Cries from the heart “ the terms ‘Data Product’, ‘Data Set,’ and ‘Version’ are overlaid with multiple meanings between communities.” (Barkstrom, 2009) “There is ambiguity in what type of object a dataset is; with different groups of users applying different connotations There needs to be an explicit statement of what the intended preservation of a dataset will imply.” (Pepler, 2008) 8
  • 10. Forcing us to conclude… No single object can possibly have all those attributes Therefore it is impossible to give the common colloquial notion of dataset a precise definition It must instead be replaced by a family of new more specific concepts Sound familiar? 9
  • 12. A FRBR inspired solution FRBR eliminates the ordinary “book” from our world The ordinary “book” can be simultaneously about chordata, in French, typeset in neo-Bauhaus, mustard-stained but FRBR replaces the book with four objects the work is about chordata, the expression is in French, the manifestation is typeset in neo-Bauhaus, the item is mustard-stained
  • 13. FRBR entities and attributes Work: “an … intellectual or artistic creation” Expression: “the … realization of a work … notation … etc.” Manifestation: “the physical embodiment of an expression of a work”. Item: “a single exemplar of a manifestation” Attribute assignments characteristically disjoint A work may have a subject. It does not have a language, typeface, or condition. An expression may have a language; It does not have a subject. (or a typeface or a condition). A manifestation may have a typeface. It does not have a subject or a language (or a condition) An item may have a condition. It does not have a subject, language, or typeface. 12
  • 14. Entities? Really? Aren’t some of those rectangles just nominalized relationships? 13
  • 15. Ambiguities Is <object name="sample_31"> <feature name="U22376" value="408" /> <feature name="X59417" value="1784" /> An expression? Is “00001011” an expression? 14
  • 16. FRBR Refactored Story M:M Symbol Structure M:M Symbol Structure M:M Matter & Energy 15
  • 17. FRBR refactored and applied to datasets All M:M C1: observations [Semantic Level] expressed by… S1: RDF triples encoded by… S2: N3 statements [Syntax Level] [Encoding levels] encoded by … S3: Unicode characters encoded by… S4: UTF-8 bit streams Based on the Systematic Assertion Model (SAM) for inscribed in… modeling datasets, developed Instantiation level by David Dubin et al. M1: RAID array state
  • 18. Identifiers What do we identify with identifiers? An entity? Content Symbol structures Patterned matter and energy A nominalized relationship? How do we confirm identification? 17
  • 19. Identification How do we identify an expression? How do we identify an encoding? How do we identify the data? On the practical side we do this every day On the theoretical side it is very difficult to usefully formalize. 18
  • 20. Identity and change problems in Planets From the Planets Conceptual Data Model, Sharpe et al. (2006) 19
  • 21. Identity and change problems in Planets • A file is a bitstream • A file can be modified • But a bitstream cannot be modified. Credits to Dave Dubin, Simone Sacchi, Karen Wickett. Data Concepts Group, Data Conservancy (NSF/OCI-ITR DataNet Award #0830976) 20
  • 22. Center for Informatics Research in Science and Scholarship (CIRSS) Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Director: Carole L Palmer Associate Director: Cathy Blake c. 12 affiliated GSLIS faculty; 8 Phd students. CIRSS research groups: Data Practices: social science of information work Socio-Technical Data Analytics: algorithms + people *Data Concepts: modeling for integration/computation Professional Education: Data curation specialization within an ALA-accredited LIS program Other options are being planned 21
  • 23. CIRSS Data Concepts Group Rationale Integration and interoperability requires robust formal conceptual models for scientific data Especially if semantic technologies are going to be exploited. Our current models aren’t good enough Mission The data concepts group takes a logic-based approach to to solving conceptual modeling problems in scientific data curation
  • 24. Questions? This research is being carried out by the Data Concepts Group at the Center for Research in Informatics and Scholarship (CIRSS) at the University of Illinois at Urbana- Champaign, Carole L. Palmer, Director. Principal contributors include David Dubin, Karen M. Wickett, Simone Sacchi, Richard Urban, Allen H Renear NSF/OCI-ITR DataNet Award #0830976 IMLS/LB Award #RE-05-08-0062-08

Notas del editor

  1. I’ll open with some cries from the heart bear with me while thisYou can find othersAnd more succinctly