SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
CISER Data Archive &
Introduction to RDM
Stuart Macdonald
CISER Data Services Librarian
srm262@cornell.edu
Research Design CRP-7201, Stone Laboratory, Cornell Univ. 19 March 2014
• CISER Data Archive
• What is Research Data Management (RDM)
• Research Data Defined
• Data Management Planning
• Organising Data
• File Formats & Transformations
• Documentation & Metadata
• Storage & Security
• Data protection & Rights
• Preservation & Sharing
• Research Data MANTRA
CISER Data Archive: Collection and Services
Established over 30 years ago
Collection of numeric datasets to support quantitative
research
c. 27,000 online files in addition to thousands of studies on CD/DVD
Emphasis on demography (state/federal censuses),
economics, health, labor, election studies, attitudinal and
behavioral studies, family life etc.
• Consulting services to match user needs with appropriate data
and statistical analysis software
•finding, accessing and using data
• Current Cornell researchers can download archive files from
online catalog (search & browse) in formats conversant with
statistical software
• Data files are identified by a ‘traffic light’ icon that indicates
usage level:
• Green – downloadable by anyone
• Yellow – downloadable from links in the catalog with CUWebAuth
authentication (for use within the CISER research computing
environment - CISERRSCH) – Cornell researchers can apply for a
computing account
• Red – data to be used in restriction (via CRADC or conditions
imposed by data provider)
CISER Data Catalog:
6
CISER Data Archive maintain links to a range of social science
data resources including:
•Data Distributors and Producers: U.S. Government e.g. Dept. Agriculture,
Dept. Commerce, Dept. Energy, Dept. Justice, Dept. Labor, Federal Agencies
•Data Distributors and Producers: Other U.S. Sources
•Data Distributors and Producers: International eg. Eurostat, FAOSTAT, ILO,
OCED, UN Statistics Division, World Bank
•Data Libraries and Archives e.g. Harvard-MIT Data Center, UKDA, DANS, CESSDA,
•Social Science Research Institutes e.g. Odum Institute, Survey Research
Institute
•Online Reference Tools e.g. Boundary files, geocoding tools, SIC codes, data
citation tools
•State and Local Government data and statistical sources e.g. NY State
Depts. Education, Health, Labor, State Data Center
See URL: http://ciser.cornell.edu/ASPs/datasource.asp
• Provides Cornell social science researchers with a
repository for sharing and providing long-term preservation
of their numeric/statistical research data
• Participates in Cornell’s Research Data Management
Service Group
• Assist Cornell social science researchers with Research
Data Management (RDM) plans
• Provide Cornell social science researchers with support
and expertise in obtaining and using restricted data
Other social science research data resources:
• Inter-University Consortium for Political and Social Research
(ICPSR)
• National Archive of Criminal Justice Data
• Minority Data Resource Center
• National Archive of Computerized Data on Aging
• Roper Center for Public Opinion Archives
• International Data Archives
• CESSDA, UKDA, Eurostat
• CESSDA catalog (DDI) provides a multi-lingual interface to datasets from
member social science data archives across Europe
• Non-Governmental Organizations
• National / Governmental Statistical Agencies
• CISER Data Archive Catalog:
http://ciser.cornell.edu/ASPs/search.asp
• ICPSR:
www.icpsr.umich.edu/
• Roper Center for Public Opinion Research:
http://www.ropercenter.uconn.edu/
• CESSDA:
http://www.cessda.org/
• Eurostat:
http://www.epp.eurostat.ec.europa.eu/
URLs:
CISER Data Archive is located at 391 Pine Tree Road,
Ithaca
CISER is open 8.30am – 4.30pm (Mon-Fri) – walk-in
assistance is not always available – so appointments are
recommended
Location & hours:
Contacts:
Tel.: (607) 255 4801
Email: ciser@cornell.edu
Introduction to Research
Data Management (RDM)
Why Manage Research Data?
Current research data management initiatives are based
on three trends:
The data deluge – exponential growth in volume of digital
research artifacts created within academia (often
created by publicly funded research)
Data management is required by multiple disciplines
Increasing perception of the value of data (data as
commodity)
What is Research Data Management?
• RDM is an umbrella terms to describe all aspects
of planning, organising, documenting, storing and
sharing research data.
• It also takes into account issues such as
documentation, data protection and
confidentiality.
• It provides a framework that supports researchers
and their data throughout the course of their
research and beyond.
• It is one of the essential areas of responsible
conduct of research
Research Data Lifecycle
Pink Colored Umbrellas Are Pretty Darned Rainproof
Research Data Defined
US Office of Management and Budget in its grants management circular A-110
defines research data as “the recorded factual material commonly accepted in
the scientific community as necessary to validate research findings.”
The KRDS2 study (Beagrie et al, 2009) define research data as ‘collections of
structured digital data from any disciplines or sources which can be used by
academic researchers to undertake their research or provides an evidential
record of their research.’
RIN Classification*
• Observational – real-time, unique, usually irreplaceable
• Experimental – from lab equipment, expensive, often reproducible
• Simulation – generated from models – model & metadata are as important as
output data
• Derived – resulting from processing or combining “raw” data. reproducible
but expensive
• Reference - a (static or organic) collection of smaller (peer-reviewed)
datasets, probably published and curated
* Stewardship of digital research data: a framework of principles and guidelines, Research Information Network, 2008. URL: http://tinyurl.com/l56gftx
Research Data Defined
• Research data, unlike other information types, is
collected, observed, or created, for purposes of
analysis to produce original research results.
• Research data can be generated for different
purposes and through different processes in a
multitude of digital formats.
Research data comes in many varied formats:
Text    Flat text files, Word, Portable Document Format (PDF), Rich‐
Text Format (RTF), Extensible Markup Language (XML).
Numerical    SPSS, Stata, Excel.‐
Multimedia - jpeg, tiff, dicom, mpeg, quicktime.
Models - 3D, statistical.
Software - Java, C.
Discipline specific - Flexible Image Transport System (FITS) in
astronomy, Crystallographic Information File (CIF) in chemistry,
Instrument specific - Olympus Confocal Microscope Data Format,Carl
Zeiss Digital Microscopic Image Format (ZVI)
Research data may include the
following:
• Documents (text, MS Word), spreadsheets
• Lab books, field notes, diaries
• Questionnaires, transcripts, codebooks
• Audiotapes, videotapes, photographs, images
• Slides, artefacts, specimens, samples
• Collection of digital objects acquired & generated during the research
process
• Database contents (video, audio, text, images)
• Models, algorithms, scripts
• Contents of an application (input, output, logfiles for analysis software,
schemas)
• Methodologies, workflows
• SOPs, protocols
By managing your data you will:
• ensure scientific integrity of research and aid replication
• ensure research data and records are accurate, complete, authentic
and reliable
• increase your research efficiency
• save time, effort and resources in the long run
• enhance data security and minimise the risk of data loss
• prevent duplication of effort by enabling others to use your data
• meet funding grant requirements
Note:
It may also be important to manage research records (both digital &
hardcopy) during and beyond the life of the project such as:
correspondence (emails)
grant applications
technical reports
research reports
consent forms
ethics applications
What Do Funders Want?
• timely release of data
- once patents are filed or on (acceptance for)
publication.
• data shared openly
- minimal or no restrictions if possible.
• preservation of data
- typically 5-10+ years if of long-term value.
• data management plans
See :
NIH Data Sharing Policy: https://grants.nih.gov/grants/policy/data_sharing/
NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp
Data Management Plan. What is it?
Funding bodies require researchers to supply detailed, cost-
effective plans for managing research data. These are called Data
Management Plans
A DMP is a document which describes:

What research data will be created.

What policies (funding, institutional, legal) apply to the data.

What data management practices (backups, storage, access
control, archiving) will be used.

What facilities and equipment are equired (hard-disk space,
backup server, repository).

Who will own the copyright and have access to the data.

How long-term preservation will be ensured after the original
research is completed.
The data management plan must be continuously maintained and
kept up-to-date throughout the course of research.
Why do we need one?
It improves your research both now and later...
•Data is often valuable for a long time!
•Results of your research may outlast your project.
•Will you use your data throughout your career?
•Prevents loss of digital data and records.
•Prevents loss of usefulness through media and software
obsolescence,
•Forgetting stuff!
Good practice Better research→
Why do we need one?
•Ensure research integrity (and repeatability) through
keeping better records.
•People can trace your outcomes from data collection,
through research methodology, through to results.
•Maximises usefulness of data to fellow researchers.
•Highlights how data was collected, quality controls,
how people can and should use it (access and
licensing).
•Facilitates data use within collaboration.
•Can help lead to subsequent research papers.
Getting started with a DMP

Gain an understanding of terminology & issues.

Gain understanding of your project/community
– Supervisor and colleagues
– People in your School, i.e. IT Officers, Research
Coordinator/Administrator

Talk to your supervisor about data authorship, IP, licensing,
policies.

Keep it practical and simple, don't spend too much time. What
you don't know leave gaps, investigate, fill in later.

Remember it is never finished! Review it regularly through the
course of your research.
CDL’s DMP Tool: https://dmp.cdlib.org/
Cornell University RDM Services Group - Writing a DMP:
https://confluence.cornell.edu/display/rdmsgweb/data-
management-planning-overview
Questions?
Benefits of organising your data
Research data files and folders need to be labelled and
organised in a systematic way so that:
•Data files are not accidentally overwritten or deleted
•Data files are distinguishable from each other within their
containing folder
•Data file naming prevents confusion when multiple people are
working on shared files
•Data files are easier to locate and browse
•Data files can be retrieved by both creator and by other users
•Data files can be sorted in logical sequence
•Different versions of data files can be identified
•If data files are moved to other storage platforms their names
will retain useful context
File Formats & Transformation
• Files are based on either text or binary encoding. The
former is both machine- and human-readable and the latter
only readable by means of appropriate software.
• Thus text files are less likely to become obsolete. Examples
of file name extensions for these files are .txt, .csv
and .por. 
• Be aware of the file formats your data exists in
– Does this format require a specific type of software?
– Can others access the data in this format?
– Can alternative formats be used?
• Using widely available or open formats maximises the
chances of your data being stable and usable
File Formats & Transformation
•When compressing  your data files for storage or
transportation you encode the information using fewer bits than
the original representation. Commonly used compression
programs are  Zip and Tar.
•You may use the process of data normalisation. This means to
convert data from one format (e.g. proprietary) into another for
use or preservation (e.g. ASCII).
•If you convert or migrate your data files from one format to
another, be aware of potential risk of data loss or corruption
and take appropriate steps to avoid/minimise it.
•Watch out for backwards compatibility if software is upgraded
Exercise 1. Formatting your data
Documenting Data
There are many reasons why you need to document your
data:
•To help you remember the details later
•To help others understand your research
•Verify your findings
•Review your submitted publication
•Replicate your results
•Archive your data for access and re-use
Some examples of data documentation are:
•Laboratory notebooks
•Field notes
•Questionnaires
Documenting Data
Research data need to be documented at various levels:
•Project level
•File or database level
•Variable or item level
The term metadata (‘data about data’) is often used.
The importance of metadata lies in the potential for
machine-to-machine interoperability to assist location and
access to data through search interfaces.
Secure data storage:
For the purposes of integrity and efficiency it is important that research
data is stored securely & backed up regularly via:
• Networked drives
• Fileservers managed by department / school / IT Dept.
• Stored in single, secure, accessible place – regular back-ups.
• Personal computers / laptops
• Convenient, temporary storage - should not be used for storing
master copies.
• Local drives may fail & laptops may get lost/stolen.
• External storage devices
• Hard drives, USB sticks, CDs, DVDs – low cost & portable BUT not
recommended for long term storage.
• Longevity not guaranteed – degradation over time.
• Easily damaged or misplaced.
• Not big enough for all research data – might be need to use multiple
discs/drives.
• May pose a security threat.
If USB sticks, DVDs, CDs are used for working data or extra back-up
then:
• Choose high quality products from reputable manufacturers.
• Conduct regular checks to ensure media is not failing.
• Periodically refresh data (i.e. copy to a new disc or drive).
• Ensure confidential data is password protected / encrypted
• Remote or online back-up services – services that
provides an online system for storing and backing-up computer
files e.g. Dropbox, Mozy, Humyo, A-Drive
• Allow users to store and sync data files online and between
computers.
• Employ cloud computing storage facilities (e.g. Amazon S3).
• Business model – first few GBs free, pay for more space.
Backing-up
Considerations for back-up policy:
• Whether all data (full back-up), or only changed data will be
backed-up (incremental back-up)?
• How often full and incremental back-ups will be made?
• How much hard-drive space or DVDs will be required to maintain
this schedule?
• If working with sensitive data, how will it be secured (and
destroyed)?
• What back-up services are available that meet your these needs?
• Who will be responsible for ensuring back-ups are available?
Recommendation:
Keep at least 3 copies of your data (e.g. original, external/local,
and external/remote) and put in place regular back-up procedure
Data Security
The means of ensuring that data is kept safe from corruption and
that access to it is suitably controlled. It is important to consider
data security to prevent:
• Accidental or malicious damage / modification to data.
• Theft of valuable or irreplaceable data.
• Breach of confidentiality agreements and privacy laws.
• Release of data before it has been checked for accuracy and
authenticity.
Exercise 2. Data storage and Security
Data Protection (also called data privacy)
• In the US, there is no single, comprehensive federal (national) law
regulating the collection and use of personal data. Instead, the US has
a patchwork system of federal and state laws, and regulations that
overlap, dovetail and may contradict one another.
• The combination of an increase in cross-border data flow, together
with the increased enactment of data protection statutes heightens the
risk of privacy violations and creates a significant challenge for a data
owner/distributor.
Data protection is the relationship between:
•collection and dissemination of data
•technology
•the public expectation of privacy and the legal and political issues
surrounding them
Rights and access
• Intellectual property rights (IPR) can be defined as rights acquired
over any work created or invented with the intellectual effort of an
individual.
• Facts are not copyrightable but the structure of a database could be.
• As a researcher, you should clarify ownership of and rights relating to
research data before a project starts. This includes the right of access
and the right to make copies.
• Data licences determine the terms and conditions of use by another,
and may accompany a purchase or subscription.
• Open data licences attempt to “set data free” by minimising and
standardising the terms and conditions of re-use. Conditions may
include attribution, non-commercial use, no derivative works, or ‘share
alike’.
Open Data Commons (ODC) have prepared a set
of licences each with an accompanying statement
which can be placed with your data on a webpage
that points to your data.
Open Data Commons: http://opendatacommons.org/
Benefits of Sharing Data
• Scientific integrity – publishing & citing data in published
research papers can allow others to replicate, validate, or
correct results, thus improving the scientific record.
• Publicly funded research - there is a growing movement for
making publicly funded research available to the public.
• Funding mandates - US Funding Agencies are increasingly
mandating data sharing so as to avoid duplication of effort and
save costs.
• Preserve research data for researchers’ own future use.
Research Data MANTRA
Research Data MANTRA
Partnership between:
EDINA & Data Library, University of Edinburgh
Institute for Academic Development
Funded by JISC Managing Research Data Programme (Sept.
2010 – Aug. 2011)
Aim was to develop online interactive open learning resources
for PhD students and early career researchers that will:
Raise awareness of the key issues related to research data
management & contribute to culture change.
Provide guidelines for good practice.
Eight units with activities, scenarios and videos:
• Research data explained
• Data management plans
• Organising data
• File formats and transformation
• Documentation and metadata
• Storage and security
• Data protection, rights and access
• Preservation, sharing and licensing
Four data handling practicals: SPSS, NVivo, R, ArcGIS
Video stories from researchers in variety of settings
Online Learning Module
Online Learning Module
• Delivered online – self-paced, available ‘anytime, anyplace’
• Emphasis on practical experience and active engagement via
online activities
• One hour per unit
• Read and work through scenarios & activities (incl. videos etc)
• CC licence to allow manipulation of content for re-use with
attribution
• Portable content in open standard formats (e.g. SCORM)
• Research data MANTRA course:
http://datalib.edina.ac.uk/mantra
Questions?

Más contenido relacionado

La actualidad más candente

HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Data Management - Lynn Woolfrey
Data Management - Lynn WoolfreyData Management - Lynn Woolfrey
Data Management - Lynn Woolfreypvhead123
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Historic Environment Scotland
 
Data management (1)
Data management (1)Data management (1)
Data management (1)SM Lalon
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersJez Cope
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopAaike De Wever
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Rebekah Cummings
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Leon Osinski
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementcunera
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataManjulaPatel
 

La actualidad más candente (20)

HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Data Management - Lynn Woolfrey
Data Management - Lynn WoolfreyData Management - Lynn Woolfrey
Data Management - Lynn Woolfrey
 
RDM & ELNs @ Edinburgh
RDM & ELNs @ EdinburghRDM & ELNs @ Edinburgh
RDM & ELNs @ Edinburgh
 
MANTRA Research Data Lifecycle
MANTRA Research Data LifecycleMANTRA Research Data Lifecycle
MANTRA Research Data Lifecycle
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshop
 
Introduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD StudentsIntroduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD Students
 
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)Data Management for Undergraduate Researchers (updated - 02/2016)
Data Management for Undergraduate Researchers (updated - 02/2016)
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 

Destacado

Certifying CISER! A Data Seal of Approval Case Study
Certifying CISER! A Data Seal of Approval Case StudyCertifying CISER! A Data Seal of Approval Case Study
Certifying CISER! A Data Seal of Approval Case StudyHistoric Environment Scotland
 
The current status of TDM in Europe
The current status of TDM in EuropeThe current status of TDM in Europe
The current status of TDM in EuropeLIBER Europe
 
Developing Data Literacy Competencies to Enhance Faculty Collaborations
Developing Data Literacy Competencies to Enhance Faculty CollaborationsDeveloping Data Literacy Competencies to Enhance Faculty Collaborations
Developing Data Literacy Competencies to Enhance Faculty CollaborationsLIBER Europe
 
Creating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationCreating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationHistoric Environment Scotland
 
E-science et le role des bibliotheques de recherche
E-science et le role des bibliotheques de rechercheE-science et le role des bibliotheques de recherche
E-science et le role des bibliotheques de rechercheLIBER Europe
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowHistoric Environment Scotland
 
LIBER fostering Open Science and Knowledge Discovery
LIBER fostering Open Science and Knowledge DiscoveryLIBER fostering Open Science and Knowledge Discovery
LIBER fostering Open Science and Knowledge DiscoveryLIBER Europe
 
Presentation-S-Hodson-JNE-30ans-des-urfist
Presentation-S-Hodson-JNE-30ans-des-urfistPresentation-S-Hodson-JNE-30ans-des-urfist
Presentation-S-Hodson-JNE-30ans-des-urfistURFIST de Paris
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
 

Destacado (20)

Certifying CISER! A Data Seal of Approval Case Study
Certifying CISER! A Data Seal of Approval Case StudyCertifying CISER! A Data Seal of Approval Case Study
Certifying CISER! A Data Seal of Approval Case Study
 
The current status of TDM in Europe
The current status of TDM in EuropeThe current status of TDM in Europe
The current status of TDM in Europe
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Edinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for DataEdinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for Data
 
Developing Data Literacy Competencies to Enhance Faculty Collaborations
Developing Data Literacy Competencies to Enhance Faculty CollaborationsDeveloping Data Literacy Competencies to Enhance Faculty Collaborations
Developing Data Literacy Competencies to Enhance Faculty Collaborations
 
RDM Programme @ Edinburgh
RDM Programme @ Edinburgh RDM Programme @ Edinburgh
RDM Programme @ Edinburgh
 
Creating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationCreating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant Application
 
RDM @ Edinburgh - Arkivum Workshop
RDM @ Edinburgh - Arkivum WorkshopRDM @ Edinburgh - Arkivum Workshop
RDM @ Edinburgh - Arkivum Workshop
 
RDM through a UK lens - New Roles for Librarians?
RDM through a UK lens - New Roles for Librarians? RDM through a UK lens - New Roles for Librarians?
RDM through a UK lens - New Roles for Librarians?
 
RDM Programme@Edinburgh
RDM Programme@EdinburghRDM Programme@Edinburgh
RDM Programme@Edinburgh
 
E-science et le role des bibliotheques de recherche
E-science et le role des bibliotheques de rechercheE-science et le role des bibliotheques de recherche
E-science et le role des bibliotheques de recherche
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to know
 
LIBER fostering Open Science and Knowledge Discovery
LIBER fostering Open Science and Knowledge DiscoveryLIBER fostering Open Science and Knowledge Discovery
LIBER fostering Open Science and Knowledge Discovery
 
Presentation-S-Hodson-JNE-30ans-des-urfist
Presentation-S-Hodson-JNE-30ans-des-urfistPresentation-S-Hodson-JNE-30ans-des-urfist
Presentation-S-Hodson-JNE-30ans-des-urfist
 
Ifla2014 session 87 96_119
Ifla2014 session 87 96_119Ifla2014 session 87 96_119
Ifla2014 session 87 96_119
 
Good Practice in Research Data Management
Good Practice in Research Data ManagementGood Practice in Research Data Management
Good Practice in Research Data Management
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...
 

Similar a Rdm slides march 2014

Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareRobin Rice
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...EDINA, University of Edinburgh
 
RDMRose 1.1 The basics
RDMRose 1.1 The basicsRDMRose 1.1 The basics
RDMRose 1.1 The basicsRDMRose
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveFuture Perfect 2012
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)Isak Van der Walt
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...ARDC
 

Similar a Rdm slides march 2014 (20)

Researh data management
Researh data managementResearh data management
Researh data management
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
 
Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
RDMRose 1.1 The basics
RDMRose 1.1 The basicsRDMRose 1.1 The basics
RDMRose 1.1 The basics
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
CISER & the Data Reference Interview
CISER & the Data Reference InterviewCISER & the Data Reference Interview
CISER & the Data Reference Interview
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
RDM: a briefing for Health Sciences
RDM: a briefing for Health SciencesRDM: a briefing for Health Sciences
RDM: a briefing for Health Sciences
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths
 
RDM@Edinburgh
RDM@EdinburghRDM@Edinburgh
RDM@Edinburgh
 
Research Data Management: Why is it important?
Research Data Management: Why is it  important?Research Data Management: Why is it  important?
Research Data Management: Why is it important?
 
E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
 

Más de Historic Environment Scotland

Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandHistoric Environment Scotland
 
Archives & Records Association summer seminar Edinburgh 7 June 2019
Archives & Records Association summer seminar   Edinburgh 7 June 2019Archives & Records Association summer seminar   Edinburgh 7 June 2019
Archives & Records Association summer seminar Edinburgh 7 June 2019Historic Environment Scotland
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareHistoric Environment Scotland
 
EPSRC research data expectations and research software management
EPSRC research data expectations and research software managementEPSRC research data expectations and research software management
EPSRC research data expectations and research software managementHistoric Environment Scotland
 
Introduction to data support services and resources for public policy
Introduction to data support services and resources for public policyIntroduction to data support services and resources for public policy
Introduction to data support services and resources for public policyHistoric Environment Scotland
 
Research Data Management at Edinburgh: Effecting Culture Change
Research Data Management at Edinburgh: Effecting Culture ChangeResearch Data Management at Edinburgh: Effecting Culture Change
Research Data Management at Edinburgh: Effecting Culture ChangeHistoric Environment Scotland
 
Harnessing Collective Intelligence For Sustainable Development
Harnessing Collective Intelligence For Sustainable DevelopmentHarnessing Collective Intelligence For Sustainable Development
Harnessing Collective Intelligence For Sustainable DevelopmentHistoric Environment Scotland
 

Más de Historic Environment Scotland (17)

Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment Scotland
 
Archives & Records Association summer seminar Edinburgh 7 June 2019
Archives & Records Association summer seminar   Edinburgh 7 June 2019Archives & Records Association summer seminar   Edinburgh 7 June 2019
Archives & Records Association summer seminar Edinburgh 7 June 2019
 
Bonares presentation oct2016v2
Bonares presentation oct2016v2Bonares presentation oct2016v2
Bonares presentation oct2016v2
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
 
RDM for trainee physicians
RDM for trainee physiciansRDM for trainee physicians
RDM for trainee physicians
 
EPSRC research data expectations and research software management
EPSRC research data expectations and research software managementEPSRC research data expectations and research software management
EPSRC research data expectations and research software management
 
Introduction to RDM for trainee physicians
Introduction to RDM for trainee physiciansIntroduction to RDM for trainee physicians
Introduction to RDM for trainee physicians
 
Introduction to data support services and resources for public policy
Introduction to data support services and resources for public policyIntroduction to data support services and resources for public policy
Introduction to data support services and resources for public policy
 
RDM @ UoE
RDM @ UoERDM @ UoE
RDM @ UoE
 
RDM Programme at University of Edinburgh
RDM Programme at University of EdinburghRDM Programme at University of Edinburgh
RDM Programme at University of Edinburgh
 
RDM Priorities, Stakeholders, Practice
RDM Priorities, Stakeholders, PracticeRDM Priorities, Stakeholders, Practice
RDM Priorities, Stakeholders, Practice
 
RDM@Edinburgh
RDM@EdinburghRDM@Edinburgh
RDM@Edinburgh
 
AddressingHistory: crowdsourcing the past
AddressingHistory: crowdsourcing the pastAddressingHistory: crowdsourcing the past
AddressingHistory: crowdsourcing the past
 
Research Data Management at Edinburgh: Effecting Culture Change
Research Data Management at Edinburgh: Effecting Culture ChangeResearch Data Management at Edinburgh: Effecting Culture Change
Research Data Management at Edinburgh: Effecting Culture Change
 
Harnessing Collective Intelligence For Sustainable Development
Harnessing Collective Intelligence For Sustainable DevelopmentHarnessing Collective Intelligence For Sustainable Development
Harnessing Collective Intelligence For Sustainable Development
 
Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 

Rdm slides march 2014

  • 1. CISER Data Archive & Introduction to RDM Stuart Macdonald CISER Data Services Librarian srm262@cornell.edu Research Design CRP-7201, Stone Laboratory, Cornell Univ. 19 March 2014
  • 2. • CISER Data Archive • What is Research Data Management (RDM) • Research Data Defined • Data Management Planning • Organising Data • File Formats & Transformations • Documentation & Metadata • Storage & Security • Data protection & Rights • Preservation & Sharing • Research Data MANTRA
  • 3. CISER Data Archive: Collection and Services Established over 30 years ago Collection of numeric datasets to support quantitative research c. 27,000 online files in addition to thousands of studies on CD/DVD Emphasis on demography (state/federal censuses), economics, health, labor, election studies, attitudinal and behavioral studies, family life etc.
  • 4. • Consulting services to match user needs with appropriate data and statistical analysis software •finding, accessing and using data • Current Cornell researchers can download archive files from online catalog (search & browse) in formats conversant with statistical software • Data files are identified by a ‘traffic light’ icon that indicates usage level: • Green – downloadable by anyone • Yellow – downloadable from links in the catalog with CUWebAuth authentication (for use within the CISER research computing environment - CISERRSCH) – Cornell researchers can apply for a computing account • Red – data to be used in restriction (via CRADC or conditions imposed by data provider)
  • 6. 6 CISER Data Archive maintain links to a range of social science data resources including: •Data Distributors and Producers: U.S. Government e.g. Dept. Agriculture, Dept. Commerce, Dept. Energy, Dept. Justice, Dept. Labor, Federal Agencies •Data Distributors and Producers: Other U.S. Sources •Data Distributors and Producers: International eg. Eurostat, FAOSTAT, ILO, OCED, UN Statistics Division, World Bank •Data Libraries and Archives e.g. Harvard-MIT Data Center, UKDA, DANS, CESSDA, •Social Science Research Institutes e.g. Odum Institute, Survey Research Institute •Online Reference Tools e.g. Boundary files, geocoding tools, SIC codes, data citation tools •State and Local Government data and statistical sources e.g. NY State Depts. Education, Health, Labor, State Data Center See URL: http://ciser.cornell.edu/ASPs/datasource.asp
  • 7. • Provides Cornell social science researchers with a repository for sharing and providing long-term preservation of their numeric/statistical research data • Participates in Cornell’s Research Data Management Service Group • Assist Cornell social science researchers with Research Data Management (RDM) plans • Provide Cornell social science researchers with support and expertise in obtaining and using restricted data
  • 8. Other social science research data resources: • Inter-University Consortium for Political and Social Research (ICPSR) • National Archive of Criminal Justice Data • Minority Data Resource Center • National Archive of Computerized Data on Aging • Roper Center for Public Opinion Archives • International Data Archives • CESSDA, UKDA, Eurostat • CESSDA catalog (DDI) provides a multi-lingual interface to datasets from member social science data archives across Europe • Non-Governmental Organizations • National / Governmental Statistical Agencies
  • 9. • CISER Data Archive Catalog: http://ciser.cornell.edu/ASPs/search.asp • ICPSR: www.icpsr.umich.edu/ • Roper Center for Public Opinion Research: http://www.ropercenter.uconn.edu/ • CESSDA: http://www.cessda.org/ • Eurostat: http://www.epp.eurostat.ec.europa.eu/ URLs:
  • 10. CISER Data Archive is located at 391 Pine Tree Road, Ithaca CISER is open 8.30am – 4.30pm (Mon-Fri) – walk-in assistance is not always available – so appointments are recommended Location & hours: Contacts: Tel.: (607) 255 4801 Email: ciser@cornell.edu
  • 11. Introduction to Research Data Management (RDM)
  • 12. Why Manage Research Data? Current research data management initiatives are based on three trends: The data deluge – exponential growth in volume of digital research artifacts created within academia (often created by publicly funded research) Data management is required by multiple disciplines Increasing perception of the value of data (data as commodity)
  • 13. What is Research Data Management? • RDM is an umbrella terms to describe all aspects of planning, organising, documenting, storing and sharing research data. • It also takes into account issues such as documentation, data protection and confidentiality. • It provides a framework that supports researchers and their data throughout the course of their research and beyond. • It is one of the essential areas of responsible conduct of research
  • 14. Research Data Lifecycle Pink Colored Umbrellas Are Pretty Darned Rainproof
  • 15. Research Data Defined US Office of Management and Budget in its grants management circular A-110 defines research data as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” The KRDS2 study (Beagrie et al, 2009) define research data as ‘collections of structured digital data from any disciplines or sources which can be used by academic researchers to undertake their research or provides an evidential record of their research.’ RIN Classification* • Observational – real-time, unique, usually irreplaceable • Experimental – from lab equipment, expensive, often reproducible • Simulation – generated from models – model & metadata are as important as output data • Derived – resulting from processing or combining “raw” data. reproducible but expensive • Reference - a (static or organic) collection of smaller (peer-reviewed) datasets, probably published and curated * Stewardship of digital research data: a framework of principles and guidelines, Research Information Network, 2008. URL: http://tinyurl.com/l56gftx
  • 16. Research Data Defined • Research data, unlike other information types, is collected, observed, or created, for purposes of analysis to produce original research results. • Research data can be generated for different purposes and through different processes in a multitude of digital formats.
  • 17. Research data comes in many varied formats: Text    Flat text files, Word, Portable Document Format (PDF), Rich‐ Text Format (RTF), Extensible Markup Language (XML). Numerical    SPSS, Stata, Excel.‐ Multimedia - jpeg, tiff, dicom, mpeg, quicktime. Models - 3D, statistical. Software - Java, C. Discipline specific - Flexible Image Transport System (FITS) in astronomy, Crystallographic Information File (CIF) in chemistry, Instrument specific - Olympus Confocal Microscope Data Format,Carl Zeiss Digital Microscopic Image Format (ZVI)
  • 18. Research data may include the following: • Documents (text, MS Word), spreadsheets • Lab books, field notes, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes, photographs, images • Slides, artefacts, specimens, samples • Collection of digital objects acquired & generated during the research process • Database contents (video, audio, text, images) • Models, algorithms, scripts • Contents of an application (input, output, logfiles for analysis software, schemas) • Methodologies, workflows • SOPs, protocols
  • 19. By managing your data you will: • ensure scientific integrity of research and aid replication • ensure research data and records are accurate, complete, authentic and reliable • increase your research efficiency • save time, effort and resources in the long run • enhance data security and minimise the risk of data loss • prevent duplication of effort by enabling others to use your data • meet funding grant requirements Note: It may also be important to manage research records (both digital & hardcopy) during and beyond the life of the project such as: correspondence (emails) grant applications technical reports research reports consent forms ethics applications
  • 20. What Do Funders Want? • timely release of data - once patents are filed or on (acceptance for) publication. • data shared openly - minimal or no restrictions if possible. • preservation of data - typically 5-10+ years if of long-term value. • data management plans See : NIH Data Sharing Policy: https://grants.nih.gov/grants/policy/data_sharing/ NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp
  • 21. Data Management Plan. What is it? Funding bodies require researchers to supply detailed, cost- effective plans for managing research data. These are called Data Management Plans A DMP is a document which describes:  What research data will be created.  What policies (funding, institutional, legal) apply to the data.  What data management practices (backups, storage, access control, archiving) will be used.  What facilities and equipment are equired (hard-disk space, backup server, repository).  Who will own the copyright and have access to the data.  How long-term preservation will be ensured after the original research is completed. The data management plan must be continuously maintained and kept up-to-date throughout the course of research.
  • 22. Why do we need one? It improves your research both now and later... •Data is often valuable for a long time! •Results of your research may outlast your project. •Will you use your data throughout your career? •Prevents loss of digital data and records. •Prevents loss of usefulness through media and software obsolescence, •Forgetting stuff! Good practice Better research→
  • 23. Why do we need one? •Ensure research integrity (and repeatability) through keeping better records. •People can trace your outcomes from data collection, through research methodology, through to results. •Maximises usefulness of data to fellow researchers. •Highlights how data was collected, quality controls, how people can and should use it (access and licensing). •Facilitates data use within collaboration. •Can help lead to subsequent research papers.
  • 24. Getting started with a DMP  Gain an understanding of terminology & issues.  Gain understanding of your project/community – Supervisor and colleagues – People in your School, i.e. IT Officers, Research Coordinator/Administrator  Talk to your supervisor about data authorship, IP, licensing, policies.  Keep it practical and simple, don't spend too much time. What you don't know leave gaps, investigate, fill in later.  Remember it is never finished! Review it regularly through the course of your research. CDL’s DMP Tool: https://dmp.cdlib.org/ Cornell University RDM Services Group - Writing a DMP: https://confluence.cornell.edu/display/rdmsgweb/data- management-planning-overview
  • 26. Benefits of organising your data Research data files and folders need to be labelled and organised in a systematic way so that: •Data files are not accidentally overwritten or deleted •Data files are distinguishable from each other within their containing folder •Data file naming prevents confusion when multiple people are working on shared files •Data files are easier to locate and browse •Data files can be retrieved by both creator and by other users •Data files can be sorted in logical sequence •Different versions of data files can be identified •If data files are moved to other storage platforms their names will retain useful context
  • 27. File Formats & Transformation • Files are based on either text or binary encoding. The former is both machine- and human-readable and the latter only readable by means of appropriate software. • Thus text files are less likely to become obsolete. Examples of file name extensions for these files are .txt, .csv and .por.  • Be aware of the file formats your data exists in – Does this format require a specific type of software? – Can others access the data in this format? – Can alternative formats be used? • Using widely available or open formats maximises the chances of your data being stable and usable
  • 28. File Formats & Transformation •When compressing  your data files for storage or transportation you encode the information using fewer bits than the original representation. Commonly used compression programs are  Zip and Tar. •You may use the process of data normalisation. This means to convert data from one format (e.g. proprietary) into another for use or preservation (e.g. ASCII). •If you convert or migrate your data files from one format to another, be aware of potential risk of data loss or corruption and take appropriate steps to avoid/minimise it. •Watch out for backwards compatibility if software is upgraded
  • 30. Documenting Data There are many reasons why you need to document your data: •To help you remember the details later •To help others understand your research •Verify your findings •Review your submitted publication •Replicate your results •Archive your data for access and re-use Some examples of data documentation are: •Laboratory notebooks •Field notes •Questionnaires
  • 31. Documenting Data Research data need to be documented at various levels: •Project level •File or database level •Variable or item level The term metadata (‘data about data’) is often used. The importance of metadata lies in the potential for machine-to-machine interoperability to assist location and access to data through search interfaces.
  • 32. Secure data storage: For the purposes of integrity and efficiency it is important that research data is stored securely & backed up regularly via: • Networked drives • Fileservers managed by department / school / IT Dept. • Stored in single, secure, accessible place – regular back-ups. • Personal computers / laptops • Convenient, temporary storage - should not be used for storing master copies. • Local drives may fail & laptops may get lost/stolen.
  • 33. • External storage devices • Hard drives, USB sticks, CDs, DVDs – low cost & portable BUT not recommended for long term storage. • Longevity not guaranteed – degradation over time. • Easily damaged or misplaced. • Not big enough for all research data – might be need to use multiple discs/drives. • May pose a security threat. If USB sticks, DVDs, CDs are used for working data or extra back-up then: • Choose high quality products from reputable manufacturers. • Conduct regular checks to ensure media is not failing. • Periodically refresh data (i.e. copy to a new disc or drive). • Ensure confidential data is password protected / encrypted
  • 34. • Remote or online back-up services – services that provides an online system for storing and backing-up computer files e.g. Dropbox, Mozy, Humyo, A-Drive • Allow users to store and sync data files online and between computers. • Employ cloud computing storage facilities (e.g. Amazon S3). • Business model – first few GBs free, pay for more space.
  • 35. Backing-up Considerations for back-up policy: • Whether all data (full back-up), or only changed data will be backed-up (incremental back-up)? • How often full and incremental back-ups will be made? • How much hard-drive space or DVDs will be required to maintain this schedule? • If working with sensitive data, how will it be secured (and destroyed)? • What back-up services are available that meet your these needs? • Who will be responsible for ensuring back-ups are available? Recommendation: Keep at least 3 copies of your data (e.g. original, external/local, and external/remote) and put in place regular back-up procedure
  • 36. Data Security The means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. It is important to consider data security to prevent: • Accidental or malicious damage / modification to data. • Theft of valuable or irreplaceable data. • Breach of confidentiality agreements and privacy laws. • Release of data before it has been checked for accuracy and authenticity.
  • 37. Exercise 2. Data storage and Security
  • 38. Data Protection (also called data privacy) • In the US, there is no single, comprehensive federal (national) law regulating the collection and use of personal data. Instead, the US has a patchwork system of federal and state laws, and regulations that overlap, dovetail and may contradict one another. • The combination of an increase in cross-border data flow, together with the increased enactment of data protection statutes heightens the risk of privacy violations and creates a significant challenge for a data owner/distributor. Data protection is the relationship between: •collection and dissemination of data •technology •the public expectation of privacy and the legal and political issues surrounding them
  • 39. Rights and access • Intellectual property rights (IPR) can be defined as rights acquired over any work created or invented with the intellectual effort of an individual. • Facts are not copyrightable but the structure of a database could be. • As a researcher, you should clarify ownership of and rights relating to research data before a project starts. This includes the right of access and the right to make copies. • Data licences determine the terms and conditions of use by another, and may accompany a purchase or subscription. • Open data licences attempt to “set data free” by minimising and standardising the terms and conditions of re-use. Conditions may include attribution, non-commercial use, no derivative works, or ‘share alike’.
  • 40. Open Data Commons (ODC) have prepared a set of licences each with an accompanying statement which can be placed with your data on a webpage that points to your data. Open Data Commons: http://opendatacommons.org/
  • 41. Benefits of Sharing Data • Scientific integrity – publishing & citing data in published research papers can allow others to replicate, validate, or correct results, thus improving the scientific record. • Publicly funded research - there is a growing movement for making publicly funded research available to the public. • Funding mandates - US Funding Agencies are increasingly mandating data sharing so as to avoid duplication of effort and save costs. • Preserve research data for researchers’ own future use.
  • 43. Research Data MANTRA Partnership between: EDINA & Data Library, University of Edinburgh Institute for Academic Development Funded by JISC Managing Research Data Programme (Sept. 2010 – Aug. 2011) Aim was to develop online interactive open learning resources for PhD students and early career researchers that will: Raise awareness of the key issues related to research data management & contribute to culture change. Provide guidelines for good practice.
  • 44. Eight units with activities, scenarios and videos: • Research data explained • Data management plans • Organising data • File formats and transformation • Documentation and metadata • Storage and security • Data protection, rights and access • Preservation, sharing and licensing Four data handling practicals: SPSS, NVivo, R, ArcGIS Video stories from researchers in variety of settings Online Learning Module
  • 45. Online Learning Module • Delivered online – self-paced, available ‘anytime, anyplace’ • Emphasis on practical experience and active engagement via online activities • One hour per unit • Read and work through scenarios & activities (incl. videos etc) • CC licence to allow manipulation of content for re-use with attribution • Portable content in open standard formats (e.g. SCORM) • Research data MANTRA course: http://datalib.edina.ac.uk/mantra

Notas del editor

  1. Data, documentation and associated files (e.g. SAS, SPSS, Stata) are housed on the CISER file server. Files are downloaded from the catalog in ZIP compressed format.. Cross-National Time Series data
  2. As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR members CESSDA member organisations adhere to a Trans-border Data Access Agreement
  3. European community household panel survey, European Union labour force survey, Community Innovation survey, European health Interview Survey, Structure of Earnings Survey, European Union Statistics on Income and Living Conditions
  4. What about preserving?
  5. Observational – sensor data, survey or sample data, neuroimages – e.g. ocean temperature, voters attitudes before an election, photographs of a supernova Experimental – e.g. gene sequences, chromatograms, toroid magnetic field data, HPLC, gel electrophoresis, chemical reaction rates, Simulation – e.g. climate models, economic models, algorithms Derived – e.g. text and data mining, compiled database, 3D models, maps Reference - e.g. gene sequence databanks, chemical structures, spatial data portals
  6. Funded by JISC as part of its UK programme, Managing Research Data to develop online learning materials to assist researchers manage their digital assets. IAD – set up to deliver training and development for postgraduate students and staff – via online course, Virtual Learning Environments, transferable skills training
  7. Shareable Content Object Reference Model – XML-based