SlideShare una empresa de Scribd logo
1 de 179
Descargar para leer sin conexión
Best Practice in Data Management and Sharing
Mojtaba Lotfaliany; MD, PhDc
PhD Student @ Non-Communicable Disease Control, School of Population and Global Health,
University of Melbourne
Researcher @ Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine
Sciences, Shahid Beheshti University of Medical Sciences
First things first!
We deeply appreciate the contribution of following organizations. Most of the
information in this presentation was derived from the Australian National Data
Service, UK Data Service, the UK data achieve and the valuable book entitled
““Managing and Sharing Research Data: A Guide to Good Practice” .
Non Communicable Disease Unit
What is this presentation about?
• Research funders are increasingly mandating open access to
research data
• Governments internationally are demanding transparency in
research
• The economic climate is requiring much greater reuse of data
• Fear of data loss calls for more robust information security
practices.
• Journal publishers increasingly require submission of the data
upon which publications are based for peer review.
• Researchers and data users recognize the long-term value of
well-prepared data
What is this presentation about?
All these factors mean that researchers will need to
improve, enhance and professionalize their research
data management skills to meet the challenge of
producing the highest quality shareable and reusable
research outputs in a responsible and efficient way.
What is this presentation about?
Robust research data management techniques give researchers and data professionals
the skills required to deal with the rapid developments in the data management
environment.
This presentation contains brief introduction of most important data management and
data sharing skills.
This presentation aims to help researchers to implement data management
(and sharing) policies in order to maximize openness of data, transparency
and accountability of research they support.
What is this presentation about?
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
• Why Manage Your Data?
• Formatting and organizing the data
• Storage and Security of Data
• Data documentation and meta data
• Quality Control
• Version controlling
• Working with sensitive data
• Controlled Vocabulary
• Centralized Data Management
What is this presentation about?
Part 2:
• Data sharing
• What are publishers & funders saying about data sharing?
• Researchers’ Attitudes
• Benefits of data sharing
• Considerations before data sharing
• Methods of Data Sharing
• Shared Data Uses and Its’ Limitations
• Data management plans
• Brief summary
• Acknowledgment , References
What is “Research Data”
What Is “Research Data”?
Research data is data that is collected, observed, or created, for purposes of analysis to
produce original research results.
• Observational
• Experimental
• Simulation
• Derived or compiled
• Reference or canonical
What Is “Research Data”?
Research data is data that is collected, observed, or created, for purposes of analysis to
produce original research results.
• Observational
• Experimental
• Simulation
• Derived or compiled
• Reference or canonical
• Text or Word documents, spreadsheets
• Laboratory notebooks, field notebooks, diaries
• Questionnaires, transcripts, codebooks
• Audiotapes, videotapes
• Photographs, films
• Test responses
• Slides, artifacts, specimens, samples
• Collection of digital objects acquired and
generated during the process of research
What Is “Research Data”?
Research data is data that is collected, observed, or created, for purposes of analysis to
produce original research results.
• Observational
• Experimental
• Simulation
• Derived or compiled
• Reference or canonical
• Text or Word documents, spreadsheets
• Laboratory notebooks, field notebooks, diaries
• Questionnaires, transcripts, codebooks
• Audiotapes, videotapes
• Photographs, films
• Test responses
• Slides, artifacts, specimens, samples
• Collection of digital objects acquired and
generated during the process of research
• Data files
• Database contents including video, audio, text,
images
• Models, algorithms, scripts
• Contents of an application such as input,
output, log files for analysis software,
simulation software, schemas
• Methodologies and workflows
• Standard operating procedures and protocols
What Is “Research Data”?
Research data is data that is collected, observed, or created, for purposes of analysis to
produce original research results.
• Observational
• Experimental
• Simulation
• Derived or compiled
• Reference or canonical
• Text or Word documents, spreadsheets
• Laboratory notebooks, field notebooks, diaries
• Questionnaires, transcripts, codebooks
• Audiotapes, videotapes
• Photographs, films
• Test responses
• Slides, artifacts, specimens, samples
• Collection of digital objects acquired and
generated during the process of research
• Data files
• Database contents including video, audio, text,
images
• Models, algorithms, scripts
• Contents of an application such as input,
output, log files for analysis software,
simulation software, schemas
• Methodologies and workflows
• Standard operating procedures and protocols
• Correspondence including electronic mail and
paper-based correspondence
• Project files
• Grant applications
• Ethics applications
• Technical reports
• Research reports
• Master lists
• Signed consent forms
How data differs across disciplines
RCSB Protein Data Bank
Australian Data Archive
Data Lifecycle
Data Management
Why Manage Your Data?
Effective research data management of medical, health and clinical
data is increasingly recognised as a critical part of the research
process. It enables:
• Trust in data you obtain for reuse from other sources
• Reproducibility of research through increasing veracity of data
• Increased quality of your research
• Strengthening of researchers’ reputation through increased
citations and reach of all research outputs
Why Manage Your Data?
Effective research data management of medical, health and clinical
data is increasingly recognised as a critical part of the research
process. It enables:
• Trust in data you obtain for reuse from other sources
• Reproducibility of research through increasing veracity of data
• Increased quality of your research
• Strengthening of researchers’ reputation through increased
citations and reach of all research outputs
• Increased connectivity between all research
outputs, and researchers
• More efficient use of scarce research funds
• Data description for sharing and collaboration
• Reduced risk of loss or corruption of data
Why Manage Your Data?
By data management we mean all data practices,
manipulations, enhancements and processes that
ensure that research data are of a high quality, are
well organized, documented, preserved, sustainable,
accessible and reusable
Why Manage Your Data?
Video
As you watch the cartoon jot down the data management mistakes
could those mistakes in the cartoon have been avoided.
Why Manage Your Data?
Formatting and organizing the data
Choosing File Formats
• All digital data exist in specific file formats; the form
in which information is coded so that a software
program can read and interpret those data.
• A particular file format is usually linked to a specific
software program.
• If the same file is to be read by a different program
it may need to be converted.
Choosing File Formats
• Format best suited for data creation
• Format best suited for data analyses and other
planned uses;
• Format best suited for long-term sustainability and
sharing of data
Choosing File Formats
• Non-proprietary or open (CSV vs. MS Excel)
• Lossless format (TIFF vs. JPEG)
• Common, used by the research community (SPSS)
• Standard representation (ASCII, Unicode)
• Easy to track changes
• Easy to be converted without data loss
• Minimal human intervention
Data conversion
• To present the data
• To analysis the data in different package
• To convert images to texts (OCR software)
• Data preservation
o After any data conversions, they should be
checked for any error, changes, or lost.
File Names
• Sensible file names and well-organized folder structures
make it easier to find and keep track of data files.
• Develop a naming system that works for your project
and use it consistently.
• Good file names can provide useful cues to the content,
status and version of a file, can uniquely identify a file
and can help in classifying and sorting files.
Best Practice for File Naming
• Create meaningful but brief names
• Use file names to classify broad types of files
• Do not use spaces, dots and special characters such as $ or ? or !
• Use hyphens '-' or underscores '_' to separate logical elements in a
file name
• Avoid very long file names
• Reserve the 3-letter file extension for application-specific codes
that represent file format. such as .doc, .xls, .mov and .tif
Best Practice for File Structure
• Think carefully how best to structure files in folders
• When working in collaboration, the need for an orderly
structure is even higher.
• Consider the best hierarchy for files, deciding whether a
deep or shallow hierarchy is preferable.
Best Practice for File Structure
Research project files could be organized according to:
• Research activity, such as interviews, surveys or focus
groups;
• Data type, such as images, text or database;
• Kind of material, for example, publications, deliverables or
documentation.
Organize Files Logically
Make sure your file system is
logical and efficient
Project 1
Time_point1
Time_point2
Biomarkers
Anthropometrics
Biodiv_H20_heatExp_2005_2008.csv
Biodiv_H20_predatorExp_2001_2003.csv
Biodiv_H20_planktonCount_start2001_active.csv
Biodiv_H20_chla_profiles_2003.csv
Project
Name
Location Experiment
Name
Date File
Format
Storage and Security of Data
Best Practice in Storing Data and Preservation
• Store data uncompressed in non-proprietary or
open standard formats for long-term software
readability
• Copy or migrate data files to new media every two
to five years
• Check the data integrity of stored data files at
regular intervals.
Best Practice in Storing Data and Preservation
Store data uncompressed in non-proprietary or open
standard formats for long-term software readability
Copy or migrate data files to new media every two to
five years
Check the data integrity of stored data files at regular
intervals.
• Organize and label stored data clearly so they
are easy to locate and physically accessible
• Ensure that areas and rooms for storage of
digital or non-digital data are fit for the
purpose, structurally sound and free from the
risk of flood and fire
• Create digital versions of paper-based data or
information in PDF/A format for long-term
preservation and storage.
Backup Your Data
• Reduce the risk of damage or loss
• Use multiple locations (here, near, far)
• Create a backup schedule
• Use reliable backup medium
• Test your backup system (i.e., test file recovery)
Physical data security
• Controlling access to rooms and buildings where
data, computers or media are held;
• Logging the removal of, and access to, media or
hardcopy material in store rooms;
• Transporting sensitive data only under exceptional
circumstances.
Network Security
• Not storing confidential data such as those
containing personal information on servers or
computers connected to an external network,
particularly servers that host Internet services
• Firewall protection and security-related upgrades
and patches to operating systems to avoid viruses
and malicious codes.
Security of computer systems
• Locking computer systems with a password and installing a firewall
system
• Protecting servers by power surge protection systems through
line-Interactive uninterruptible power supply (UPS) systems
• Imposing non-disclosure agreements for managers or users of
confidential data
• Not sending personal or confidential data via email or other file
transfer means without first encrypting them
• Remembering that file-sharing services such as Google Docs and
Dropbox may not be suitable for certain types of information.
Data Encryption
Access controlling and security
• Needing specific authorization from the data owner to access
data
• Placing confidential data under embargo for a given period of
time until confidentiality is no longer pertinent
• providing access to approved researchers only
• providing secure access to data through enabling remote
analysis of confidential data but excluding the ability to
download data
• Mixed levels of access regulations
Mixed levels of access regulations
Data documentation and Meta Data
Data documentation
• The collective term 'data documentation' includes
information on why and how data were created,
prepared or digitized, what they mean, what their
content and structure are, and any alterations or
coding that may have taken place.
• Good documentation is critical for understanding
data in the short, medium and longer term; and is
vital for successful long-term data preservation.
Data documentation levels
Data documentation requires descriptive material at two
levels.
• The high-level information, commonly known as study-
level or describes the research project, the data creation
processes, rights and general contexts.
• The data-level information covers descriptions and
annotations at the file and within-file level.
Metadata are a specific subset of data documentation that
provides structured searchable information
Good study-level data documentation includes:
• Research design and context of data collection
• Data collection methods
• Structure of data files, with number of cases,
records, files and variables, as well as any
relationships among such items;
• Secondary data sources used and provenance
• Data validation, checking, proofing, cleaning and
other quality assurance procedures
Good study-level data documentation includes:
• Research design and context of data collection
• Data collection methods
• Structure of data files, with number of cases,
records, files and variables, as well as any
relationships among such items;
• Secondary data sources used and provenance
• Data validation, checking, proofing, cleaning and
other quality assurance procedures
• Modifications made to data over time since their
original creation and identification of different
versions of datasets;
• Information on data confidentiality, access and
any applicable conditions of use;
• Publications, presentations and other research
outputs that explain or draw on the data.
Data-level data documentation
Metadata can be generated manually, or it can be
created automatically.
Within data base or in separate files
Within file meta data
Within file meta data
Data Dictionary
Project Documentation Dataset Documentation
• Context of data collection
• Data collection methods
• Structure, organization of data files
• Data sources used
• Data validation, quality assurance
• Transformations of data from the
raw data through analysis
• Information on confidentiality,
access and use conditions
• Variable names and descriptions
• Explanation of codes and schemas
used
• Algorithms used to transform data
• File format and software (including
version) used
Data Dictionary Structure
Good data-level data documentation (for
tabular data) includes:
• Names, labels and descriptions
• Value code labels
• Coding and classification schemes
• Codes for missing values
• Derived data
• Weighting and grossing variables
Quality Control
Data Quality Control
• They are fit for their intended uses in operations,
decision making and planning.
• If the ISO 9000:2015 definition of quality
• Completeness
• Validity
• Accuracy
• Consistency
• Availability
• Timeliness
Data Quality Control in Data Entry
• Calibration of instruments
• Taking multiple measurements, observations or
samples
• Checking the truth of the record with an expert;
• Using standardized methods and protocols for
capturing observations
• Customize questions
Data Checking
During data checking, data are edited, cleaned, verified,
cross-checked and validated.
• Double-checking coding of observations or responses and
out-of-range values
• Checking data completeness
• Verifying random samples of the digital data against the
original data
• Double entry of data
• Statistical analyses such as frequencies, means, ranges or
clustering to detect errors and anomalous values
• Proof-reading transcriptions
• Peer review
Data Quality Control
• Misleading data
• Duplicate data
• Incorrect data
• Inaccurate data
• Non-integrated data
• Data that violates business rules
• Data without a generalized formatting
• Incorrectly punctuated or spelled data
Data Quality Control
• Manually?
• OpenRefine (formerly Google Refine) is a valuable
open source tool that is similar to Excel but more
powerful. You can use it to: record data; manipulate
data; clean up dirty data; and to transform datasets.
• Other alternatives
Version controlling and tracking
What a version is?
• A version is “a particular form of something differing
in certain respects from an earlier form or other
forms of the same type of thing”.
• In the case of research data, a new version of a
dataset may be created when an existing dataset is
reprocessed, corrected or appended with additional
data.
• Versioning is one means by which to track changes
associated with ‘dynamic’ data that is not static over
time.
What a version is?
• Scenario 1: a new observation is created and it
should be added to the dataset
• Scenario 2: an existing observation is removed and it
should be deleted from the dataset
• Scenario 3: an error was identified in one of the
existing observation stored in the dataset and this
error must be corrected.
Version controlling and tracking
• Version information makes a revision of a dataset uniquely
identifiable.
• Uniqueness can be used by researchers to determine whether
and how data has changed over time and to determine
specifically which version of a dataset they are working with.
• Explicit versioning allows for repeatability in research, enables
comparisons, and prevents confusion.
Version controlling and tracking
Version controlling and tracking
Tools for version controlling
Version control tables
Best Practice in Version Controlling
• Decide how many versions of a file to keep,
• Identity milestone versions to keep
• Uniquely identity different versions of files using a
systematic naming convention, such as using version
numbers or dates
• Record changes made to a tile when a new version
is created
Best Practice in Version Controlling
• Decide how many versions of a file to keep,
• Identity milestone versions to keep
• Uniquely identity different versions of files using a
systematic naming convention, such as using version
numbers or dates
• Record changes made to a tile when a new version
is created
• Record relationships between items where
needed,
• Track the location of files it they are stored in a
variety of locations
• Regularly synchronize files in different locations
• Identify a single Location for the storage of
milestone and master versions.
Working with sensitive data
What is sensitive data
Sensitive data are data that can be used to identify an
individual, species, object, or location that introduces a
risk of discrimination, harm, or unwanted attention.
What is sensitive data
Some examples but not all:
Data de-identification
A person's identity can be disclosed from:
• Direct identifiers such as names, addresses, postcode
information, telephone numbers or pictures
• Indirect identifiers which, when linked with other
publicly available information sources, could identify
someone, for example information on workplace,
occupation or exceptional values of characteristics
like salary or age
Data de-identification
• Removing direct identifiers, e.g. name or address
• Aggregating or reducing the precision of information or a variable, e.g.
replacing date of birth by age groups
• Generalizing the meaning of detailed text, e.g. replacing a doctor's Detailed
area of medical expertise with an area of medical specialty
• Using pseudonyms
• Restricting the upper or lower ranges of a variable to hide outliers, e.g. top-
coding salaries
• Consider Statistical Disclosure Techniques (SDC)
Data de-identification
Data de-identification
Read more about techniques
• 'Preparing raw clinical data for publication: guidance for journal editors,
authors, and peer reviewers'.
• The UK Data Service outlines approaches for de-identifying quantitative
and qualitative.
Controlled Vocabulary
Controlled Vocabulary
• Controlled vocabularies ensure shared understanding of the
terminologies used in taxonomies and classifications.
• Using established vocabularies promotes interoperability,
discovery and re-use of data.
• Goal of Controlled vocabulary
Controlled Vocabulary
Video
Controlled Vocabulary examples
METeOR
METeOR is Australia’s repository for national metadata standards for health, housing
and community services statistics and information.
An example: Person—weight (measured), total kilograms N[NN].N
Have your own controlled Vocabulary
Controlled Vocabulary
Controlled Vocabulary
1. Find and learn about controlled vocabularies relevant
to research.
2. Access those vocabularies and reuse them in your
community.
3. Integrate vocabularies into your local information
systems at a technical level.
4. Upload and describe a vocabulary to share with others.
5. Make a vocabulary machine readable (more easily
integrated into other's systems).
6. Create new or import existing vocabularies
and manage them with your community's input.
A good example
The Australian Longitudinal Study of Ageing
Centralized Data Management
Centralized Data Management
• researchers can share good practice and data management experiences with each
other
• building capacity, collective knowledge and resources for the center
• new researchers can immediately implement good data practices from this shared
expertise
• a uniform approach to data management by creating standard data policies and
procedures
• keeping track of projects and owners of data over time, especially when researchers
come and go
• storing and backing up data in a central location;
• making researchers and staff aware of duties, responsibilities, funder and legal
requirements relating to research data, with easy access to relevant information
• ensuring that data management is costed into funding proposals.
A Centralized Data Management may include:
• what the data mean
• how they were created
• where they were obtained
• who owns them
• who has access, use and editing rights
• who is responsible for managing them
• storage and backup strategies
• data quality control procedures
• different versions of files
• how they wilt or can be shared
A Centralized Data Management may include:
• Acts and Regulations and local statement or policy on data sharing
• codes of practice or professional standards relevant to research data
• exemplar data management plans:,
• a statement of Institutional IT data management and existing backup
procedures:
• security policy for data storage and data format recommendations
• quality control standards for data collection and data entry;
• file-naming and version control guidance
• template consent forms and information sheets
• example ethical review forms and data anonymization guidelines
• confidentiality agreements for data handlers.
A Centralized Data Management, How?
• An institutional or departmental drive where access can be
provided to external researchers. for example. through
remote access via virtual private network (VPN) techniques
• A secure file transfer protocol (FTP) server
• A virtual Research Environment (VRE) or portal
environment.
A Centralized Data Management, How?
• A content management system such as Drupal
• Cloud-based file-sharing areas such as Dropbox.
Google Docs. Google Drive.
• A data repository such as Dspace, Fedora, Sprints,
CKAN or cloud-based figshare.
Part 2: Data sharing
Open / Shared / Closed: The world of data
Video
What are publishers & funders saying
about data sharing?
The Data Sharing Agenda
Organization for Economic Cooperation and Development (OECD)
Principles and Guidelines for Access to Research Data from
Public Funding:
Publicly funded research data are a public good, produced in the
public interest, and that it should be made openly available with
as few restrictions as possible in a timely and responsible
manner without harming intellectual property (OECD, 2007).
The Data Sharing Agenda
Organization for Economic Cooperation and Development (OECD)
Principles and Guidelines for Access to Research Data from
Public Funding:
Publicly funded research data are a public good, produced in the
public interest, and that it should be made openly available with
as few restrictions as possible in a timely and responsible
manner without harming intellectual property (OECD, 2007).
The Berlin Declaration on Open Access to Knowledge in the
Sciences and Humanities:
The Berlin Declaration called for promoting knowledge
dissemination through the open access paradigm via the
internet, which requires the worldwide web to be sustainable,
interactive and transparent, with openly accessible and
compatible content and tools (Berlin Declaration, 2003).
The Data Sharing Agenda
Organization for Economic Cooperation and Development (OECD)
Principles and Guidelines for Access to Research Data from
Public Funding:
Publicly funded research data are a public good, produced in the
public interest, and that it should be made openly available with
as few restrictions as possible in a timely and responsible
manner without harming intellectual property (OECD, 2007).
The Berlin Declaration on Open Access to Knowledge in the
Sciences and Humanities:
The Berlin Declaration called for promoting knowledge
dissemination through the open access paradigm via the
internet, which requires the worldwide web to be sustainable,
interactive and transparent, with openly accessible and
compatible content and tools (Berlin Declaration, 2003).
The High Level Expert Group on Scientific Data:
Noting the rising tide of data, proposed that we are on the
verge of a great new leap in scientific capability, fuelled by data,
with a need for a scientific e-infrastructure that supports
seamless access, use, reuse and trust of data (European
Commission, 2010).
The report sketches the benefits and costs of accelerating the
development of a fully functional e-infrastructure for scientific
data. Open infrastructure, open culture and open content need
to go hand in hand.
Data sharing policies of major medical funders
Data sharing policies of major medical funders
Data sharing policies of major medical funders
Data sharing policies of major medical funders
Data sharing policies of major medical funders
Data sharing policies of major medical funders
Data sharing policies of major medical funders
Data sharing policies of major medical funders
Journals and Publishers
• Data sharing policies are becoming increasingly
common in Australia and internationally.
• More and more journal publishers are asking
authors to make the data underpinning a journal
article available.
New journal data policies
Researchers’ Attitudes
Researcher motivations for sharing data
Source
Researcher motivations for sharing data
Source
Researcher motivations for sharing data
Source
Why some researchers remain reluctant to
share their own research data
• 42% Intellectual property or confidentiality issues
• 36% My funder/institution does not require data sharing
• 26% I am concerned that my research will be scooped
• 26% I am concerned about misinterpretation or misuse
• 23% Ethical concerns
• 22% I am concerned about being given proper citation credit or
attribution
• 21% I did not know where to share my data
• 20% Insufficient time and/or resources
• 16% I did not know how to share my data
• 12% I don’t think it is my responsibility
• 12% I did not consider the data to be relevant
• 11% Lack of funding
• 7% Other Source
Why some researchers remain reluctant to
share their own research data
• My data are not of interest or use to anyone else.
• I want to publish my work before anyone else sees my data.
• I have not got the time or money to prepare data for sharing.
• If I ask my respondents for consent to share their data then
they will not agree to participate in the study.
• Other researchers would not understand my data at all or
may use it in a wrong way
Data sharing trends by country
Source: http://www.acscinf.org/PDF/Giffi-%20Researcher%20Data%20Insights%20--
%20Infographic%20FINAL%20REVISED.pdf
Benefits of data sharing
Benefits for researchers
• Increases visibility of scholarly work;
• Likely to increase citations rates,
• Enables new collaborations;
• Encourages scientific enquiry and debate;
• Promotes innovation and potential new data uses;
• Establishes links to next generation of researchers.
Benefits for research funders:
• Promotes primary and secondary use of data;
• Makes optimal use of publicly funded research;
• Avoids duplication of data collection;
• Maximizes return on investment.
Benefits for the scholarly community
• Maintains professional standards of open inquiry;
• Maximizes transparency and accountability;
• Promotes innovation through unanticipated and new uses of
data;
• Enables scrutiny of research findings;
• Improves quality from verification, replication and
trustworthiness;
• Encourages the improvement and validation of research
methods;
• Provides resources for teaching and learning.
Benefits for research participants
• Allows maximum use of contributed information;
• Minimizes data collection on difficult-to-reach or
over-researched populations;
• Allows participants' experiences to be understood
as widely as ethically possible.
Benefits for the public
• Advances science to the benefit of society;
• Adopts emerging norms such as open access
publishing
• To be, and appear to be, open and accountable;
• Complies with openness laws and regulations.
Considerations before data sharing
Considerations before data sharing
• Good data management
• Meeting ethical and legal obligations
• Intellectual property rights
• Data licensing
• Meta data schema and cross-walking
Good data management
• Data can only be shared if they are of high
quality, well-curated, well-documented, and
can be referenced and indexed.
• Data integrity translates as accuracy and consistency
and is ensured through quality control.
Legal obligations
Legislation that may impact on the sharing of data:
• Privacy Act 1988
• Human Rights Act 2004
• Freedom of Information Act 1982
• The Freedom of Information Amendment (Reform)
Act 2010
Human Research Ethics
Researchers should:
• Inform participants how research data will be
stored, preserved and used in the long-term
• Inform participants how confidentiality will be
maintained, e.g. by anonymizing data
• Obtain informed consent, either written or verbal,
for data sharing
Levels of consent
• ‘Specific': limited to the specific project under consideration
• ‘Extended': given for the use of data or tissue in future
research projects that are either (i) an extension of, or closely
related to, the original project; or (ii) in the same general
area of research (for example, genealogical, ethnographical,
epidemiological, or chronic illness research);
• ‘Unspecified': given for the use of data or tissue in any future
research.
Intellectual Property Rights
• In most research institutions, such as universities, the
institution owns IP rights arising from research undertaken
by employees in the course of their employment.
• A research funder may also wish to exert some claim over
rights, although, in most cases, IP rights are attributed to the
researcher unless an out-put becomes commercially viable.
• If a university research project has commercial collaborators
there may be joint IP rights in the research outputs, which
are best handled via consortium agreements or legal
contracts.
Intellectual Property Rights
• Copyright and Exemptions Under Fair Dealing Copyright is an
intellectual property right assigned automatically to the
creator.
• Copyright cannot be taken away without consent and cannot
be abused without the possibility of legal action ensuing.
• Most research outputs, including spreadsheets, publications,
textual files, reports and computer programs, fall under
literary work and are therefore protected by copyright.
The Freedom of Information Legislation (FOI)
• There exist rights for people to request access to recorded
information held by public sector organizations.
• This can include research data held by universities or
research institutions.
• Many countries have some form of Freedom of Information
legislation, which is designed to ensure accountability and
good governance in public authorities.
• Research data can be requested under the FOI Act and legally
supplied to anyone, but copyright and IP rights to such data
remain with the original researcher.
What is a license? Why apply a license?
• When considering sharing your data, you need to consider
how you want your data to be reused by other researchers or
students.
• You can specify this by licensing the data to match the
intended uses.
• The data publisher, be it a data center, archive or repository,
usually does not expect to have rights in the data collections it
distributes or provides access to.
• Rather, a researcher or data creator will retain the copyright
in their data and give the center a non-exclusive license to
redistribute the data.
What is a license? Why apply a license?
• All copyright holders with some claim over the data collection
need to agree to the terms of deposit.
• Without this license agreement in place, a data center or
institutional repository cannot legally provide access to the
data.
AusGOAL Framework
AusGOAL Framework
How do I apply a license?
• You must ‘own’ the data to apply the license
• Look at your institution/s IP policies
• When partnering: agree – before collecting the data –
who can apply the license and what that license will be
• Include this info in HREC application
How open can I be?
• Consent? (For what?)
• Potential for harm/discrimination?
• Data modified to address identification, limit harm?
• HREC approval?
Meta data schema and cross-walking
• A metadata standard is a schema that has been
formally approved and published, (ANZLIC and DDI).
• Numerous metadata standards exist and the
standard chosen to describe resources such as
research data should be appropriate to the project
or discipline.
• Directory of Disciplinary Metadata
Meta data cross-walking
• Many of these contributors use different metadata schemas
in creating their research data records.
• The records of each contributor need to be ‘cross-walked’
• A schema crosswalk is a table that shows equivalent
elements in more than one database schema.
• It maps the elements in one schema to the equivalent
elements in another schema.
Meta data cross-walking
• The flexible structure of XML makes it possible to
convert data from one metadata standard to
another using an XSLT.
• XSLT (Extensible Stylesheet Language
Transformations) is a language for transforming
XML documents into other XML documents.
Methods of Data Sharing
The crucial role of data repositories
• Informal sharing
• Specialist data centers, archive or repository
• An institutional repository
• Submitting to a journal to support a publication
• Publish in a data journal
• Dissemination via a project or institutional website
• Self-publishing via a cloud-based system
Informal sharing
Data can be shared with colleagues and trusted
collaborations or upon request.
• Easy
• No citation
• Little credit
• Not easily to find the appropriate data
• No data preservation
• Duplication of effort
Specialist data centers, archive or repository
• Repositories enable discovery of data by publishing
data descriptions ("metadata") about the data they
hold - like a library catalogue describes individual
materials held in a library.
• You can publish a description (i.e. the metadata) of
your data without making the data itself openly
accessible, which enables you to place conditions
around access to the data.
Specialist data centers, archive or repository
• Assurance that data meet set quality standards
• Long-term preservation of data in standard file
formats which can be upgraded when needed due
to software upgrades or changes
• Safe-keeping of data in a secure environment with
the ability to control access where this is required
• Regular data backups
• Online resource discovery of data through data
catalogues
Specialist data centers, archive or repository
• Access to data in popular file formats
• Licensing arrangements to acknowledge data rights
and appropriate handling of confidential data
• Standard citation mechanism to acknowledge data
creation;
• Promotion of data to many users
• Monitoring of the secondary usage of data
• Management of access to data and user queries on
behalf of the data owner
An institutional repository
• Great wide-scale visibility for scholars of the
institutions
• Meeting security and ethical obligations
• Concerns about data preservation
• Concerns about data shareability
Access controlling
• Data centers typically liaise with the researchers who
own the data in selecting the most suitable type of
access for data.
• Access regulations should always be proportionate to the
kind of data and confidentiality involved.
• Access conditions which require that the data center
contact the researcher directly about each particular
request may result in extended delays before access is
granted.
Data portals
Data portals or aggregators draw together
research data records from a number of repositories,
• Research Data Australia (RDA) aggregates records
from over 100 Australian research repositories
• re3data
• The largest and most comprehensive registry of
data repositories available on the web.
• Subject specific
Data portals
Journals and Data Publishing
• As a supplementary material. (Mandatory/Optional)
• Data Paper or Data Article
• Meta data or Data
• Data Journals or regular journals
• Data Papers are subject of full peer review
• Quality control and technical reviews
Journals and Data Publishing
Project websites and Cloud space
• Project Websites and Linked Open Data Project
websites can provide easy immediate storage and
simplified access to research data
• No sustainability for the longer-term.
• Difficult to control who is using data and how
• Needs a backup and exit plan for sharing data.
• This should be viewed as a short-term impact-
generating facility and not a long-term data storage
solution.
What is data citation?
• Data citation refers to the practice of providing a
reference to data in the same way as researchers
routinely provide a bibliographic reference to outputs
such as journal articles, reports and conference
papers.
• Citing data is increasingly being recognized as one of
the key practices leading to recognition of data as a
primary research output.
The importance of data citation?
• Acknowledges the author's sources;
• Makes identifying data easier;
• Promotes the reproduction of research results;
• Makes it easier to find data;
• Allows the impact of data to be tracked;
• Provides a structure that recognizes and can reward
data creators.
Data citation
Data citation conventions
• Uniform Resource Names (URN)
• Uniform Resource Locators(URL)
• Digital Object Identifiers (DOI)
Some Other similar identifiers:
• International Standard Book Number (ISBN)
• The Open Researcher and Contributor ID (ORCID)
Shared data Use and Its’ Limitations
Shared data can be used for:
• Descriptive and Historical studies
• Comparative Studies
• Secondary Analysis
• Replication and Validation of Published Articles
• Research Design and Methodological Advancement
• Teaching and Learning
Shared data limitations:
• Lack of availability of suitable data
• Fit of secondary data analysis
• Time to use unfamiliar data
• Unfamiliarity with appropriate statistical methods of
secondary analysis
• Lack of rich-enough documentation
• Concern about ethical reuse of data
Data management plans
Importance of data management plans
• We have explored several important data
management concepts during these presentation.
• A Data Management Plan (DMP) documents how
data will be managed, stored and shared during and
after a research project.
• Some research funders and human research ethics
committees are now requesting that researchers
submit a DMP as part of their project proposal.
What a good Data management plan includes
Important: Each element is linked to further information in ANDS website.
Preparing a Data Management Plan
• The best research practice is to consider these at
the start of a project.
• By planning ahead the research team can improve
research efficiency, guard against data loss, enhance
data security, and ensure research data integrity and
replicability.
• Many Data Management Plan templates are now
freely available for reuse.
Data Management Plan
• Data Management Checklists
• Online planning tools
Video
Brief summary
Do you have a successful data?
Need to know about data management more?
• Deakin University Library's pages on Managing your
Research Data
• Cambridge University pages on Research Data
Management
• University of Leicester Data management support
for researchers pages
• 23 (research data) Things - ANDS
Need to know about data management more?
Acknowledgment , References
We deeply appreciate the contribution of Australian National Data Service
Acknowledgment , References
We deeply appreciate the contribution of
the book entitled “Managing and Sharing
Research Data: A Guide to Good Practice”
published by Louise Corti, Veerle Van den
Eynden, Libby Bishop, Matthew Woollard;
SAGE, 20 Mar. 2014 .
* Most of the information in this
presentation was derived from this valuable
book.
Acknowledgment , References
We deeply appreciate the contribution of the UK Data Service and the UK data achieve
Acknowledgment
Acknowledgment
Data Management, Metadata and Data Sharing Workgroup
The Iran Cohort Consortium (ICC)
Acknowledgment
Sherry Lake
Kathryn Unsworth
Acknowledgment
Non Communicable Disease Unit
Davood Khalili
Brian Oldenburg
Contact info
• Email: mlotfaliany@student.unimelb.edu.au
• Tel: +61 450 55 1367
• Please don’t hesitate to contact me!

Más contenido relacionado

La actualidad más candente

Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and LibrariansJohann van Wyk
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)Isak Van der Walt
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016IzzyChad
 
Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Leeds
 
Data presentation and transfer
Data presentation and transferData presentation and transfer
Data presentation and transferIyad Abou Rabii
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Leon Osinski
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data ManagementOpenAIRE
 

La actualidad más candente (20)

Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
 
Why managedata
Why managedataWhy managedata
Why managedata
 
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
 
E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)E research africa presentation (19 nov 2014)
E research africa presentation (19 nov 2014)
 
Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2016-02-22 - Humanities Div...
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017
 
Data presentation and transfer
Data presentation and transferData presentation and transfer
Data presentation and transfer
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
 
DC101 UWE
DC101 UWEDC101 UWE
DC101 UWE
 
Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...
 
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of OxfordData Management Planning for Researchers - 2016-02-08 - University of Oxford
Data Management Planning for Researchers - 2016-02-08 - University of Oxford
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
 
Writing a Research Data Management Plan - 2016-11-09 - University of Oxford
Writing a Research Data Management Plan - 2016-11-09 - University of OxfordWriting a Research Data Management Plan - 2016-11-09 - University of Oxford
Writing a Research Data Management Plan - 2016-11-09 - University of Oxford
 

Similar a Best Practice in Data Management and Sharing

Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librariansC. Tobin Magle
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management Wendy Mears
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data ManagementIzzyChad
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate StudentsRebekah Cummings
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data ManagmentDaniel Crane
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016 Rebecca Raworth, MLIS
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016Rebecca Raworth, MLIS
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataIzzyChad
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
 
Research Data Management in practice
Research Data Management in practiceResearch Data Management in practice
Research Data Management in practiceARDC
 
Whitehead Seminar 5/2
Whitehead Seminar 5/2Whitehead Seminar 5/2
Whitehead Seminar 5/2Physion
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017ARDC
 

Similar a Best Practice in Data Management and Sharing (20)

Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Rsearch data & you
Rsearch data & youRsearch data & you
Rsearch data & you
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data Management
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Data Management for Graduate Students
Data Management for Graduate StudentsData Management for Graduate Students
Data Management for Graduate Students
 
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data Managment
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research data
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Research Data Management in practice
Research Data Management in practiceResearch Data Management in practice
Research Data Management in practice
 
Whitehead Seminar 5/2
Whitehead Seminar 5/2Whitehead Seminar 5/2
Whitehead Seminar 5/2
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
 

Último

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Último (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Best Practice in Data Management and Sharing

  • 1. Best Practice in Data Management and Sharing Mojtaba Lotfaliany; MD, PhDc PhD Student @ Non-Communicable Disease Control, School of Population and Global Health, University of Melbourne Researcher @ Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences
  • 2. First things first! We deeply appreciate the contribution of following organizations. Most of the information in this presentation was derived from the Australian National Data Service, UK Data Service, the UK data achieve and the valuable book entitled ““Managing and Sharing Research Data: A Guide to Good Practice” . Non Communicable Disease Unit
  • 3. What is this presentation about? • Research funders are increasingly mandating open access to research data • Governments internationally are demanding transparency in research • The economic climate is requiring much greater reuse of data • Fear of data loss calls for more robust information security practices. • Journal publishers increasingly require submission of the data upon which publications are based for peer review. • Researchers and data users recognize the long-term value of well-prepared data
  • 4. What is this presentation about? All these factors mean that researchers will need to improve, enhance and professionalize their research data management skills to meet the challenge of producing the highest quality shareable and reusable research outputs in a responsible and efficient way.
  • 5. What is this presentation about? Robust research data management techniques give researchers and data professionals the skills required to deal with the rapid developments in the data management environment. This presentation contains brief introduction of most important data management and data sharing skills. This presentation aims to help researchers to implement data management (and sharing) policies in order to maximize openness of data, transparency and accountability of research they support.
  • 6. What is this presentation about? Introduction: What Is “Research Data”? and Data Lifecycle Part 1: • Why Manage Your Data? • Formatting and organizing the data • Storage and Security of Data • Data documentation and meta data • Quality Control • Version controlling • Working with sensitive data • Controlled Vocabulary • Centralized Data Management
  • 7. What is this presentation about? Part 2: • Data sharing • What are publishers & funders saying about data sharing? • Researchers’ Attitudes • Benefits of data sharing • Considerations before data sharing • Methods of Data Sharing • Shared Data Uses and Its’ Limitations • Data management plans • Brief summary • Acknowledgment , References
  • 9. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical
  • 10. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Test responses • Slides, artifacts, specimens, samples • Collection of digital objects acquired and generated during the process of research
  • 11. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Test responses • Slides, artifacts, specimens, samples • Collection of digital objects acquired and generated during the process of research • Data files • Database contents including video, audio, text, images • Models, algorithms, scripts • Contents of an application such as input, output, log files for analysis software, simulation software, schemas • Methodologies and workflows • Standard operating procedures and protocols
  • 12. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Test responses • Slides, artifacts, specimens, samples • Collection of digital objects acquired and generated during the process of research • Data files • Database contents including video, audio, text, images • Models, algorithms, scripts • Contents of an application such as input, output, log files for analysis software, simulation software, schemas • Methodologies and workflows • Standard operating procedures and protocols • Correspondence including electronic mail and paper-based correspondence • Project files • Grant applications • Ethics applications • Technical reports • Research reports • Master lists • Signed consent forms
  • 13. How data differs across disciplines RCSB Protein Data Bank Australian Data Archive
  • 16. Why Manage Your Data? Effective research data management of medical, health and clinical data is increasingly recognised as a critical part of the research process. It enables: • Trust in data you obtain for reuse from other sources • Reproducibility of research through increasing veracity of data • Increased quality of your research • Strengthening of researchers’ reputation through increased citations and reach of all research outputs
  • 17. Why Manage Your Data? Effective research data management of medical, health and clinical data is increasingly recognised as a critical part of the research process. It enables: • Trust in data you obtain for reuse from other sources • Reproducibility of research through increasing veracity of data • Increased quality of your research • Strengthening of researchers’ reputation through increased citations and reach of all research outputs • Increased connectivity between all research outputs, and researchers • More efficient use of scarce research funds • Data description for sharing and collaboration • Reduced risk of loss or corruption of data
  • 18. Why Manage Your Data? By data management we mean all data practices, manipulations, enhancements and processes that ensure that research data are of a high quality, are well organized, documented, preserved, sustainable, accessible and reusable
  • 19. Why Manage Your Data? Video As you watch the cartoon jot down the data management mistakes could those mistakes in the cartoon have been avoided.
  • 22. Choosing File Formats • All digital data exist in specific file formats; the form in which information is coded so that a software program can read and interpret those data. • A particular file format is usually linked to a specific software program. • If the same file is to be read by a different program it may need to be converted.
  • 23. Choosing File Formats • Format best suited for data creation • Format best suited for data analyses and other planned uses; • Format best suited for long-term sustainability and sharing of data
  • 24. Choosing File Formats • Non-proprietary or open (CSV vs. MS Excel) • Lossless format (TIFF vs. JPEG) • Common, used by the research community (SPSS) • Standard representation (ASCII, Unicode) • Easy to track changes • Easy to be converted without data loss • Minimal human intervention
  • 25. Data conversion • To present the data • To analysis the data in different package • To convert images to texts (OCR software) • Data preservation o After any data conversions, they should be checked for any error, changes, or lost.
  • 26. File Names • Sensible file names and well-organized folder structures make it easier to find and keep track of data files. • Develop a naming system that works for your project and use it consistently. • Good file names can provide useful cues to the content, status and version of a file, can uniquely identify a file and can help in classifying and sorting files.
  • 27. Best Practice for File Naming • Create meaningful but brief names • Use file names to classify broad types of files • Do not use spaces, dots and special characters such as $ or ? or ! • Use hyphens '-' or underscores '_' to separate logical elements in a file name • Avoid very long file names • Reserve the 3-letter file extension for application-specific codes that represent file format. such as .doc, .xls, .mov and .tif
  • 28. Best Practice for File Structure • Think carefully how best to structure files in folders • When working in collaboration, the need for an orderly structure is even higher. • Consider the best hierarchy for files, deciding whether a deep or shallow hierarchy is preferable.
  • 29. Best Practice for File Structure Research project files could be organized according to: • Research activity, such as interviews, surveys or focus groups; • Data type, such as images, text or database; • Kind of material, for example, publications, deliverables or documentation.
  • 30. Organize Files Logically Make sure your file system is logical and efficient Project 1 Time_point1 Time_point2 Biomarkers Anthropometrics Biodiv_H20_heatExp_2005_2008.csv Biodiv_H20_predatorExp_2001_2003.csv Biodiv_H20_planktonCount_start2001_active.csv Biodiv_H20_chla_profiles_2003.csv Project Name Location Experiment Name Date File Format
  • 32. Best Practice in Storing Data and Preservation • Store data uncompressed in non-proprietary or open standard formats for long-term software readability • Copy or migrate data files to new media every two to five years • Check the data integrity of stored data files at regular intervals.
  • 33. Best Practice in Storing Data and Preservation Store data uncompressed in non-proprietary or open standard formats for long-term software readability Copy or migrate data files to new media every two to five years Check the data integrity of stored data files at regular intervals. • Organize and label stored data clearly so they are easy to locate and physically accessible • Ensure that areas and rooms for storage of digital or non-digital data are fit for the purpose, structurally sound and free from the risk of flood and fire • Create digital versions of paper-based data or information in PDF/A format for long-term preservation and storage.
  • 34. Backup Your Data • Reduce the risk of damage or loss • Use multiple locations (here, near, far) • Create a backup schedule • Use reliable backup medium • Test your backup system (i.e., test file recovery)
  • 35. Physical data security • Controlling access to rooms and buildings where data, computers or media are held; • Logging the removal of, and access to, media or hardcopy material in store rooms; • Transporting sensitive data only under exceptional circumstances.
  • 36. Network Security • Not storing confidential data such as those containing personal information on servers or computers connected to an external network, particularly servers that host Internet services • Firewall protection and security-related upgrades and patches to operating systems to avoid viruses and malicious codes.
  • 37. Security of computer systems • Locking computer systems with a password and installing a firewall system • Protecting servers by power surge protection systems through line-Interactive uninterruptible power supply (UPS) systems • Imposing non-disclosure agreements for managers or users of confidential data • Not sending personal or confidential data via email or other file transfer means without first encrypting them • Remembering that file-sharing services such as Google Docs and Dropbox may not be suitable for certain types of information.
  • 39. Access controlling and security • Needing specific authorization from the data owner to access data • Placing confidential data under embargo for a given period of time until confidentiality is no longer pertinent • providing access to approved researchers only • providing secure access to data through enabling remote analysis of confidential data but excluding the ability to download data • Mixed levels of access regulations
  • 40. Mixed levels of access regulations
  • 42. Data documentation • The collective term 'data documentation' includes information on why and how data were created, prepared or digitized, what they mean, what their content and structure are, and any alterations or coding that may have taken place. • Good documentation is critical for understanding data in the short, medium and longer term; and is vital for successful long-term data preservation.
  • 43. Data documentation levels Data documentation requires descriptive material at two levels. • The high-level information, commonly known as study- level or describes the research project, the data creation processes, rights and general contexts. • The data-level information covers descriptions and annotations at the file and within-file level. Metadata are a specific subset of data documentation that provides structured searchable information
  • 44. Good study-level data documentation includes: • Research design and context of data collection • Data collection methods • Structure of data files, with number of cases, records, files and variables, as well as any relationships among such items; • Secondary data sources used and provenance • Data validation, checking, proofing, cleaning and other quality assurance procedures
  • 45. Good study-level data documentation includes: • Research design and context of data collection • Data collection methods • Structure of data files, with number of cases, records, files and variables, as well as any relationships among such items; • Secondary data sources used and provenance • Data validation, checking, proofing, cleaning and other quality assurance procedures • Modifications made to data over time since their original creation and identification of different versions of datasets; • Information on data confidentiality, access and any applicable conditions of use; • Publications, presentations and other research outputs that explain or draw on the data.
  • 46. Data-level data documentation Metadata can be generated manually, or it can be created automatically. Within data base or in separate files
  • 49. Data Dictionary Project Documentation Dataset Documentation • Context of data collection • Data collection methods • Structure, organization of data files • Data sources used • Data validation, quality assurance • Transformations of data from the raw data through analysis • Information on confidentiality, access and use conditions • Variable names and descriptions • Explanation of codes and schemas used • Algorithms used to transform data • File format and software (including version) used
  • 51. Good data-level data documentation (for tabular data) includes: • Names, labels and descriptions • Value code labels • Coding and classification schemes • Codes for missing values • Derived data • Weighting and grossing variables
  • 53. Data Quality Control • They are fit for their intended uses in operations, decision making and planning. • If the ISO 9000:2015 definition of quality • Completeness • Validity • Accuracy • Consistency • Availability • Timeliness
  • 54. Data Quality Control in Data Entry • Calibration of instruments • Taking multiple measurements, observations or samples • Checking the truth of the record with an expert; • Using standardized methods and protocols for capturing observations • Customize questions
  • 55. Data Checking During data checking, data are edited, cleaned, verified, cross-checked and validated. • Double-checking coding of observations or responses and out-of-range values • Checking data completeness • Verifying random samples of the digital data against the original data • Double entry of data • Statistical analyses such as frequencies, means, ranges or clustering to detect errors and anomalous values • Proof-reading transcriptions • Peer review
  • 56. Data Quality Control • Misleading data • Duplicate data • Incorrect data • Inaccurate data • Non-integrated data • Data that violates business rules • Data without a generalized formatting • Incorrectly punctuated or spelled data
  • 57. Data Quality Control • Manually? • OpenRefine (formerly Google Refine) is a valuable open source tool that is similar to Excel but more powerful. You can use it to: record data; manipulate data; clean up dirty data; and to transform datasets. • Other alternatives
  • 59. What a version is? • A version is “a particular form of something differing in certain respects from an earlier form or other forms of the same type of thing”. • In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. • Versioning is one means by which to track changes associated with ‘dynamic’ data that is not static over time.
  • 60. What a version is? • Scenario 1: a new observation is created and it should be added to the dataset • Scenario 2: an existing observation is removed and it should be deleted from the dataset • Scenario 3: an error was identified in one of the existing observation stored in the dataset and this error must be corrected.
  • 61. Version controlling and tracking • Version information makes a revision of a dataset uniquely identifiable. • Uniqueness can be used by researchers to determine whether and how data has changed over time and to determine specifically which version of a dataset they are working with. • Explicit versioning allows for repeatability in research, enables comparisons, and prevents confusion.
  • 64. Tools for version controlling
  • 66. Best Practice in Version Controlling • Decide how many versions of a file to keep, • Identity milestone versions to keep • Uniquely identity different versions of files using a systematic naming convention, such as using version numbers or dates • Record changes made to a tile when a new version is created
  • 67. Best Practice in Version Controlling • Decide how many versions of a file to keep, • Identity milestone versions to keep • Uniquely identity different versions of files using a systematic naming convention, such as using version numbers or dates • Record changes made to a tile when a new version is created • Record relationships between items where needed, • Track the location of files it they are stored in a variety of locations • Regularly synchronize files in different locations • Identify a single Location for the storage of milestone and master versions.
  • 69. What is sensitive data Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention.
  • 70. What is sensitive data Some examples but not all:
  • 71. Data de-identification A person's identity can be disclosed from: • Direct identifiers such as names, addresses, postcode information, telephone numbers or pictures • Indirect identifiers which, when linked with other publicly available information sources, could identify someone, for example information on workplace, occupation or exceptional values of characteristics like salary or age
  • 72. Data de-identification • Removing direct identifiers, e.g. name or address • Aggregating or reducing the precision of information or a variable, e.g. replacing date of birth by age groups • Generalizing the meaning of detailed text, e.g. replacing a doctor's Detailed area of medical expertise with an area of medical specialty • Using pseudonyms • Restricting the upper or lower ranges of a variable to hide outliers, e.g. top- coding salaries • Consider Statistical Disclosure Techniques (SDC)
  • 74. Data de-identification Read more about techniques • 'Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers'. • The UK Data Service outlines approaches for de-identifying quantitative and qualitative.
  • 76. Controlled Vocabulary • Controlled vocabularies ensure shared understanding of the terminologies used in taxonomies and classifications. • Using established vocabularies promotes interoperability, discovery and re-use of data. • Goal of Controlled vocabulary
  • 79. METeOR METeOR is Australia’s repository for national metadata standards for health, housing and community services statistics and information. An example: Person—weight (measured), total kilograms N[NN].N
  • 80. Have your own controlled Vocabulary
  • 82. Controlled Vocabulary 1. Find and learn about controlled vocabularies relevant to research. 2. Access those vocabularies and reuse them in your community. 3. Integrate vocabularies into your local information systems at a technical level. 4. Upload and describe a vocabulary to share with others. 5. Make a vocabulary machine readable (more easily integrated into other's systems). 6. Create new or import existing vocabularies and manage them with your community's input.
  • 83. A good example The Australian Longitudinal Study of Ageing
  • 85. Centralized Data Management • researchers can share good practice and data management experiences with each other • building capacity, collective knowledge and resources for the center • new researchers can immediately implement good data practices from this shared expertise • a uniform approach to data management by creating standard data policies and procedures • keeping track of projects and owners of data over time, especially when researchers come and go • storing and backing up data in a central location; • making researchers and staff aware of duties, responsibilities, funder and legal requirements relating to research data, with easy access to relevant information • ensuring that data management is costed into funding proposals.
  • 86. A Centralized Data Management may include: • what the data mean • how they were created • where they were obtained • who owns them • who has access, use and editing rights • who is responsible for managing them • storage and backup strategies • data quality control procedures • different versions of files • how they wilt or can be shared
  • 87. A Centralized Data Management may include: • Acts and Regulations and local statement or policy on data sharing • codes of practice or professional standards relevant to research data • exemplar data management plans:, • a statement of Institutional IT data management and existing backup procedures: • security policy for data storage and data format recommendations • quality control standards for data collection and data entry; • file-naming and version control guidance • template consent forms and information sheets • example ethical review forms and data anonymization guidelines • confidentiality agreements for data handlers.
  • 88. A Centralized Data Management, How? • An institutional or departmental drive where access can be provided to external researchers. for example. through remote access via virtual private network (VPN) techniques • A secure file transfer protocol (FTP) server • A virtual Research Environment (VRE) or portal environment.
  • 89. A Centralized Data Management, How? • A content management system such as Drupal • Cloud-based file-sharing areas such as Dropbox. Google Docs. Google Drive. • A data repository such as Dspace, Fedora, Sprints, CKAN or cloud-based figshare.
  • 90. Part 2: Data sharing
  • 91. Open / Shared / Closed: The world of data Video
  • 92. What are publishers & funders saying about data sharing?
  • 93. The Data Sharing Agenda Organization for Economic Cooperation and Development (OECD) Principles and Guidelines for Access to Research Data from Public Funding: Publicly funded research data are a public good, produced in the public interest, and that it should be made openly available with as few restrictions as possible in a timely and responsible manner without harming intellectual property (OECD, 2007).
  • 94. The Data Sharing Agenda Organization for Economic Cooperation and Development (OECD) Principles and Guidelines for Access to Research Data from Public Funding: Publicly funded research data are a public good, produced in the public interest, and that it should be made openly available with as few restrictions as possible in a timely and responsible manner without harming intellectual property (OECD, 2007). The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities: The Berlin Declaration called for promoting knowledge dissemination through the open access paradigm via the internet, which requires the worldwide web to be sustainable, interactive and transparent, with openly accessible and compatible content and tools (Berlin Declaration, 2003).
  • 95. The Data Sharing Agenda Organization for Economic Cooperation and Development (OECD) Principles and Guidelines for Access to Research Data from Public Funding: Publicly funded research data are a public good, produced in the public interest, and that it should be made openly available with as few restrictions as possible in a timely and responsible manner without harming intellectual property (OECD, 2007). The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities: The Berlin Declaration called for promoting knowledge dissemination through the open access paradigm via the internet, which requires the worldwide web to be sustainable, interactive and transparent, with openly accessible and compatible content and tools (Berlin Declaration, 2003). The High Level Expert Group on Scientific Data: Noting the rising tide of data, proposed that we are on the verge of a great new leap in scientific capability, fuelled by data, with a need for a scientific e-infrastructure that supports seamless access, use, reuse and trust of data (European Commission, 2010). The report sketches the benefits and costs of accelerating the development of a fully functional e-infrastructure for scientific data. Open infrastructure, open culture and open content need to go hand in hand.
  • 96. Data sharing policies of major medical funders
  • 97. Data sharing policies of major medical funders
  • 98. Data sharing policies of major medical funders
  • 99. Data sharing policies of major medical funders
  • 100. Data sharing policies of major medical funders
  • 101. Data sharing policies of major medical funders
  • 102. Data sharing policies of major medical funders
  • 103. Data sharing policies of major medical funders
  • 104. Journals and Publishers • Data sharing policies are becoming increasingly common in Australia and internationally. • More and more journal publishers are asking authors to make the data underpinning a journal article available.
  • 105. New journal data policies
  • 107. Researcher motivations for sharing data Source
  • 108. Researcher motivations for sharing data Source
  • 109. Researcher motivations for sharing data Source
  • 110. Why some researchers remain reluctant to share their own research data • 42% Intellectual property or confidentiality issues • 36% My funder/institution does not require data sharing • 26% I am concerned that my research will be scooped • 26% I am concerned about misinterpretation or misuse • 23% Ethical concerns • 22% I am concerned about being given proper citation credit or attribution • 21% I did not know where to share my data • 20% Insufficient time and/or resources • 16% I did not know how to share my data • 12% I don’t think it is my responsibility • 12% I did not consider the data to be relevant • 11% Lack of funding • 7% Other Source
  • 111. Why some researchers remain reluctant to share their own research data • My data are not of interest or use to anyone else. • I want to publish my work before anyone else sees my data. • I have not got the time or money to prepare data for sharing. • If I ask my respondents for consent to share their data then they will not agree to participate in the study. • Other researchers would not understand my data at all or may use it in a wrong way
  • 112. Data sharing trends by country Source: http://www.acscinf.org/PDF/Giffi-%20Researcher%20Data%20Insights%20-- %20Infographic%20FINAL%20REVISED.pdf
  • 113. Benefits of data sharing
  • 114. Benefits for researchers • Increases visibility of scholarly work; • Likely to increase citations rates, • Enables new collaborations; • Encourages scientific enquiry and debate; • Promotes innovation and potential new data uses; • Establishes links to next generation of researchers.
  • 115. Benefits for research funders: • Promotes primary and secondary use of data; • Makes optimal use of publicly funded research; • Avoids duplication of data collection; • Maximizes return on investment.
  • 116. Benefits for the scholarly community • Maintains professional standards of open inquiry; • Maximizes transparency and accountability; • Promotes innovation through unanticipated and new uses of data; • Enables scrutiny of research findings; • Improves quality from verification, replication and trustworthiness; • Encourages the improvement and validation of research methods; • Provides resources for teaching and learning.
  • 117. Benefits for research participants • Allows maximum use of contributed information; • Minimizes data collection on difficult-to-reach or over-researched populations; • Allows participants' experiences to be understood as widely as ethically possible.
  • 118. Benefits for the public • Advances science to the benefit of society; • Adopts emerging norms such as open access publishing • To be, and appear to be, open and accountable; • Complies with openness laws and regulations.
  • 120. Considerations before data sharing • Good data management • Meeting ethical and legal obligations • Intellectual property rights • Data licensing • Meta data schema and cross-walking
  • 121. Good data management • Data can only be shared if they are of high quality, well-curated, well-documented, and can be referenced and indexed. • Data integrity translates as accuracy and consistency and is ensured through quality control.
  • 122. Legal obligations Legislation that may impact on the sharing of data: • Privacy Act 1988 • Human Rights Act 2004 • Freedom of Information Act 1982 • The Freedom of Information Amendment (Reform) Act 2010
  • 123. Human Research Ethics Researchers should: • Inform participants how research data will be stored, preserved and used in the long-term • Inform participants how confidentiality will be maintained, e.g. by anonymizing data • Obtain informed consent, either written or verbal, for data sharing
  • 124. Levels of consent • ‘Specific': limited to the specific project under consideration • ‘Extended': given for the use of data or tissue in future research projects that are either (i) an extension of, or closely related to, the original project; or (ii) in the same general area of research (for example, genealogical, ethnographical, epidemiological, or chronic illness research); • ‘Unspecified': given for the use of data or tissue in any future research.
  • 125. Intellectual Property Rights • In most research institutions, such as universities, the institution owns IP rights arising from research undertaken by employees in the course of their employment. • A research funder may also wish to exert some claim over rights, although, in most cases, IP rights are attributed to the researcher unless an out-put becomes commercially viable. • If a university research project has commercial collaborators there may be joint IP rights in the research outputs, which are best handled via consortium agreements or legal contracts.
  • 126. Intellectual Property Rights • Copyright and Exemptions Under Fair Dealing Copyright is an intellectual property right assigned automatically to the creator. • Copyright cannot be taken away without consent and cannot be abused without the possibility of legal action ensuing. • Most research outputs, including spreadsheets, publications, textual files, reports and computer programs, fall under literary work and are therefore protected by copyright.
  • 127. The Freedom of Information Legislation (FOI) • There exist rights for people to request access to recorded information held by public sector organizations. • This can include research data held by universities or research institutions. • Many countries have some form of Freedom of Information legislation, which is designed to ensure accountability and good governance in public authorities. • Research data can be requested under the FOI Act and legally supplied to anyone, but copyright and IP rights to such data remain with the original researcher.
  • 128. What is a license? Why apply a license? • When considering sharing your data, you need to consider how you want your data to be reused by other researchers or students. • You can specify this by licensing the data to match the intended uses. • The data publisher, be it a data center, archive or repository, usually does not expect to have rights in the data collections it distributes or provides access to. • Rather, a researcher or data creator will retain the copyright in their data and give the center a non-exclusive license to redistribute the data.
  • 129. What is a license? Why apply a license? • All copyright holders with some claim over the data collection need to agree to the terms of deposit. • Without this license agreement in place, a data center or institutional repository cannot legally provide access to the data.
  • 132. How do I apply a license? • You must ‘own’ the data to apply the license • Look at your institution/s IP policies • When partnering: agree – before collecting the data – who can apply the license and what that license will be • Include this info in HREC application
  • 133. How open can I be? • Consent? (For what?) • Potential for harm/discrimination? • Data modified to address identification, limit harm? • HREC approval?
  • 134. Meta data schema and cross-walking • A metadata standard is a schema that has been formally approved and published, (ANZLIC and DDI). • Numerous metadata standards exist and the standard chosen to describe resources such as research data should be appropriate to the project or discipline. • Directory of Disciplinary Metadata
  • 135. Meta data cross-walking • Many of these contributors use different metadata schemas in creating their research data records. • The records of each contributor need to be ‘cross-walked’ • A schema crosswalk is a table that shows equivalent elements in more than one database schema. • It maps the elements in one schema to the equivalent elements in another schema.
  • 136. Meta data cross-walking • The flexible structure of XML makes it possible to convert data from one metadata standard to another using an XSLT. • XSLT (Extensible Stylesheet Language Transformations) is a language for transforming XML documents into other XML documents.
  • 137. Methods of Data Sharing
  • 138. The crucial role of data repositories • Informal sharing • Specialist data centers, archive or repository • An institutional repository • Submitting to a journal to support a publication • Publish in a data journal • Dissemination via a project or institutional website • Self-publishing via a cloud-based system
  • 139. Informal sharing Data can be shared with colleagues and trusted collaborations or upon request. • Easy • No citation • Little credit • Not easily to find the appropriate data • No data preservation • Duplication of effort
  • 140. Specialist data centers, archive or repository • Repositories enable discovery of data by publishing data descriptions ("metadata") about the data they hold - like a library catalogue describes individual materials held in a library. • You can publish a description (i.e. the metadata) of your data without making the data itself openly accessible, which enables you to place conditions around access to the data.
  • 141. Specialist data centers, archive or repository • Assurance that data meet set quality standards • Long-term preservation of data in standard file formats which can be upgraded when needed due to software upgrades or changes • Safe-keeping of data in a secure environment with the ability to control access where this is required • Regular data backups • Online resource discovery of data through data catalogues
  • 142. Specialist data centers, archive or repository • Access to data in popular file formats • Licensing arrangements to acknowledge data rights and appropriate handling of confidential data • Standard citation mechanism to acknowledge data creation; • Promotion of data to many users • Monitoring of the secondary usage of data • Management of access to data and user queries on behalf of the data owner
  • 143. An institutional repository • Great wide-scale visibility for scholars of the institutions • Meeting security and ethical obligations • Concerns about data preservation • Concerns about data shareability
  • 144. Access controlling • Data centers typically liaise with the researchers who own the data in selecting the most suitable type of access for data. • Access regulations should always be proportionate to the kind of data and confidentiality involved. • Access conditions which require that the data center contact the researcher directly about each particular request may result in extended delays before access is granted.
  • 145. Data portals Data portals or aggregators draw together research data records from a number of repositories, • Research Data Australia (RDA) aggregates records from over 100 Australian research repositories • re3data • The largest and most comprehensive registry of data repositories available on the web. • Subject specific
  • 147. Journals and Data Publishing • As a supplementary material. (Mandatory/Optional) • Data Paper or Data Article • Meta data or Data • Data Journals or regular journals • Data Papers are subject of full peer review • Quality control and technical reviews
  • 148. Journals and Data Publishing
  • 149. Project websites and Cloud space • Project Websites and Linked Open Data Project websites can provide easy immediate storage and simplified access to research data • No sustainability for the longer-term. • Difficult to control who is using data and how • Needs a backup and exit plan for sharing data. • This should be viewed as a short-term impact- generating facility and not a long-term data storage solution.
  • 150. What is data citation? • Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to outputs such as journal articles, reports and conference papers. • Citing data is increasingly being recognized as one of the key practices leading to recognition of data as a primary research output.
  • 151. The importance of data citation? • Acknowledges the author's sources; • Makes identifying data easier; • Promotes the reproduction of research results; • Makes it easier to find data; • Allows the impact of data to be tracked; • Provides a structure that recognizes and can reward data creators.
  • 153. Data citation conventions • Uniform Resource Names (URN) • Uniform Resource Locators(URL) • Digital Object Identifiers (DOI) Some Other similar identifiers: • International Standard Book Number (ISBN) • The Open Researcher and Contributor ID (ORCID)
  • 154. Shared data Use and Its’ Limitations
  • 155. Shared data can be used for: • Descriptive and Historical studies • Comparative Studies • Secondary Analysis • Replication and Validation of Published Articles • Research Design and Methodological Advancement • Teaching and Learning
  • 156. Shared data limitations: • Lack of availability of suitable data • Fit of secondary data analysis • Time to use unfamiliar data • Unfamiliarity with appropriate statistical methods of secondary analysis • Lack of rich-enough documentation • Concern about ethical reuse of data
  • 158. Importance of data management plans • We have explored several important data management concepts during these presentation. • A Data Management Plan (DMP) documents how data will be managed, stored and shared during and after a research project. • Some research funders and human research ethics committees are now requesting that researchers submit a DMP as part of their project proposal.
  • 159. What a good Data management plan includes Important: Each element is linked to further information in ANDS website.
  • 160. Preparing a Data Management Plan • The best research practice is to consider these at the start of a project. • By planning ahead the research team can improve research efficiency, guard against data loss, enhance data security, and ensure research data integrity and replicability. • Many Data Management Plan templates are now freely available for reuse.
  • 161. Data Management Plan • Data Management Checklists • Online planning tools Video
  • 163.
  • 164.
  • 165.
  • 166.
  • 167.
  • 168.
  • 169. Do you have a successful data?
  • 170. Need to know about data management more? • Deakin University Library's pages on Managing your Research Data • Cambridge University pages on Research Data Management • University of Leicester Data management support for researchers pages • 23 (research data) Things - ANDS
  • 171. Need to know about data management more?
  • 172. Acknowledgment , References We deeply appreciate the contribution of Australian National Data Service
  • 173. Acknowledgment , References We deeply appreciate the contribution of the book entitled “Managing and Sharing Research Data: A Guide to Good Practice” published by Louise Corti, Veerle Van den Eynden, Libby Bishop, Matthew Woollard; SAGE, 20 Mar. 2014 . * Most of the information in this presentation was derived from this valuable book.
  • 174. Acknowledgment , References We deeply appreciate the contribution of the UK Data Service and the UK data achieve
  • 176. Acknowledgment Data Management, Metadata and Data Sharing Workgroup The Iran Cohort Consortium (ICC)
  • 178. Acknowledgment Non Communicable Disease Unit Davood Khalili Brian Oldenburg
  • 179. Contact info • Email: mlotfaliany@student.unimelb.edu.au • Tel: +61 450 55 1367 • Please don’t hesitate to contact me!