SlideShare una empresa de Scribd logo
1 de 14
MAKING SENSE OF
A COLLECTION
This work is licensed under a
Creative Commons Attribution 2.0
UK: England & Wales License
Gareth Knight
London School of Hygiene & Tropical Medicine
gareth.knight@lshtm.ac.uk
Getting Started in Digital Preservation
The Information Technologists, London
23rd April 2015
Case Studies
National service that preserved
research, teaching and learning
resources in arts & humanities
between 1996 - 2008
Institutional RDM service that
helps LSHTM researchers to
curate & preserved research data
in public health & tropical
medicine
Need for Digital Preservation
Data Storage
media
Computing
device
Operating
System
Software
application
Information
+ + + + =
Deteriorate & change
over time
Obsolete & replaced
over time
What does this
mean?
“Digital information lasts forever – or five years, whichever comes first”
Jeff Rothenberg, 1997
Climb the preservation mountain
“the series of managed activities necessary to ensure continued
access to digital materials for as long as necessary.”
Neil Beagrie and Maggie Jones (2008)
Beagrie & Jones: http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts
Caplan: http://journals.ala.org/ltr/article/view/4224/4809
Modified version of
Caplan’s
Preservation
Pyramid
Content can
be used
Content is
understandable
Content is
rendered accurately
Bits are stored exactly
Its value is recognised & it is acquired
Data exists
Digital Detectives
• Digital preservation often a process
of investigation & deduction
• Resource intensive
– Time
– Physical space
– Hardware/software costs
• How much effort are you willing to
make? What is good enough?
https://www.flickr.com/photos/ollieolarte/3028314931
Acquire data
Acquisition depends upon object
to be preserved & how stored
• Media: Floppy disk, CD/DVD, ZIP/Jaz disk,
hard disk, solid state devices, etc.
• Electronic: Email, cloud services
Invest in infrastructure to support
preservation process
• Computer hardware
• Media readers
• 3rd party services can provide
advice and hardware rental
where needed
https://www.flickr.com/photos/adactio/13127134455
Case Study: AHDS History dataset
Deposited by children of noted researcher in
2006 & processed by GK
Documentation:
Accompanying notes in researcher’s
handwriting described a history DB they were
working on in 1988.
Challenges:
• 5.25" disk drive was available
• Disk was failing, but managed to create a
complete copy on 5th attempt
• Disk analysis revealed text content…
The author's short stories, not a dataset!
Result:
Not accessioned, but children were pleased
http://www.old-computers.com/museum/computer.asp?st=1&c=810
History database created on a Shelton
Instruments Sig-Net, running CP/M
2.2.operating system in 1988 & saved to
5.25” disk
Check completeness
What does the creator intend to
provide?
• Data
• Documentation
• Research instruments
What have they actually provided?
• Some data
• Creation software & random files
• Personal music collection?
• Request a file manifest:
– Filename
– Description
– Format
https://www.flickr.com/photos/kyngpao/14455832915
Case Study: Early English Books Online
Collection of 125,000 early printed books
deposited for preservation:
• XML files, scanned TIFFs & PDFs for each
page
• Well structured & labelled
Problems:
• Hard disk was failing
• XML output from Content Management
system - incomplete header & missing
schema
• 30% of files referenced in XML were missing
Solution:
• Obtained schema & missing files (but took a
long, long time)
Render data
Decode file format
Reflect tools & software available at
point of creation:
• Information content
• Contextual information
(documentation/metadata)
Analyse organisation structure
Intrinsic relationships important for
decoding multi-file objects
• Filenames & directory structure
Solution
• Specialist software may be required to
access
• Liaise with data creators
https://www.flickr.com/photos/hawksanddoves/83818392
How many locks do you have to get
through to reach your destination?
Case Study: Scientific dataset
USB stick of LSHTM dataset containing:
• FCS2.0 - tabular data outlining experiments
to count cells, sort them & identify
biomarkers
• Leica Experiment Collection - .lei library file &
associated images with embedded metadata
Challenges:
• Domain & proprietary formats
– FITS (file) provides limited info on .lei
– FCS not recognised
• Complex relationship in Leica experiment -
recorded in filename & internal manifest
(partial) Solution
• Store files as-is
• Obtain text output of FCS files
• Analyse using open source tools
Understand data
• 17th-18th Century Enlightenment
built on information sharing
• Openness & transparency essential
for academic research
– Evidence of activity
– Open to scrutiny & replication
• Can you establish who, what, where,
when & how?
• How much documentation can only
be found in the data creator’s head?
https://www.flickr.com/photos/domiriel/5234590796
Case Study: Adolphe Appia
Warwick Uni. School of Theatre Studies modelled
performance space of Appia's Festspielhaus at
Hellerau.
Collection deposited on several CDs:
• Digitised photographs of 1991 performance
• VRML 3D models of performance space
• Videos of 3D models in .mov format
• Documentation & Metadata
Problem
• Image metadata ‘disappeared’ on transfer
Solution:
• Descriptions added to file attributes, which were
being removed when written to disc
• Output file attributes to text file
• Compressed files and copied to disk© King's Visualisation
Lab, King's College
London
http://www.kvl.cch.k
cl.ac.uk/appia.html
Final thoughts
1. Analyse your needs & capabilities
– What can you do with existing resources?
– What future investment is possible?
2. Inform users of your expectations from
the outset
– File formats
– Documentation
– File structure & naming conventions
– Permissions
3. Help them to fulfil expectations
– Advice and guidance
http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-curate-41/

Más contenido relacionado

La actualidad más candente

Making Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of VictoriaMaking Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of VictoriaAlan Manifold
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...DuraSpace
 
Linked Data: thinking big, starting small
Linked Data: thinking big, starting smallLinked Data: thinking big, starting small
Linked Data: thinking big, starting smallPeter Neish
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation SlidesDuraSpace
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...WARCnet
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Peter Neish
 
Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...Peter Neish
 
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813jeffreylancaster
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectivePeter Löwe
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Sergio Fernández
 
Aggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project ExperiencesAggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project ExperiencesAdrian Stevenson
 
PhD Thesis Digitisation Project
PhD Thesis Digitisation ProjectPhD Thesis Digitisation Project
PhD Thesis Digitisation ProjectLorna Campbell
 
IPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working GroupIPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working GroupStuart Myles
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731jeffreylancaster
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationWARCnet
 
Historical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilityHistorical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilitySimon Price
 
Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Nick Sheppard
 

La actualidad más candente (20)

Making Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of VictoriaMaking Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of Victoria
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
 
Linked Data: thinking big, starting small
Linked Data: thinking big, starting smallLinked Data: thinking big, starting small
Linked Data: thinking big, starting small
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011
 
Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...
 
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)
 
Linked Data
Linked DataLinked Data
Linked Data
 
Aggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project ExperiencesAggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project Experiences
 
PhD Thesis Digitisation Project
PhD Thesis Digitisation ProjectPhD Thesis Digitisation Project
PhD Thesis Digitisation Project
 
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
 
IPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working GroupIPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working Group
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman Presentation
 
Historical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilityHistorical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utility
 
Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)
 

Similar a Making Sense of a Digital Collection

Managing Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to SolutionManaging Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to Solutionsuyu22
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
Keynote: Unexpected repurposing
Keynote: Unexpected repurposingKeynote: Unexpected repurposing
Keynote: Unexpected repurposinglabsbl
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...ARDC
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...DuraSpace
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionKay Gregg
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationAndrew Treloar
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemASIS&T
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
Pitts Library Digitization Initiatives
Pitts Library Digitization InitiativesPitts Library Digitization Initiatives
Pitts Library Digitization Initiativesjbweave
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
 

Similar a Making Sense of a Digital Collection (20)

Ji cv6n1
Ji cv6n1Ji cv6n1
Ji cv6n1
 
Managing Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to SolutionManaging Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to Solution
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
Keynote: Unexpected repurposing
Keynote: Unexpected repurposingKeynote: Unexpected repurposing
Keynote: Unexpected repurposing
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly Communication
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Pitts Library Digitization Initiatives
Pitts Library Digitization InitiativesPitts Library Digitization Initiatives
Pitts Library Digitization Initiatives
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 

Más de GarethKnight

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in ResearchGarethKnight
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankGarethKnight
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospectiveGarethKnight
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyGarethKnight
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionGarethKnight
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceGarethKnight
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...GarethKnight
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyGarethKnight
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...GarethKnight
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationGarethKnight
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the ArchiveGarethKnight
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and CurateGarethKnight
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...GarethKnight
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...GarethKnight
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital researchGarethKnight
 

Más de GarethKnight (17)

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in Research
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bank
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospective
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case study
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategy
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the Archive
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and Curate
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital research
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Making Sense of a Digital Collection

  • 1. MAKING SENSE OF A COLLECTION This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License Gareth Knight London School of Hygiene & Tropical Medicine gareth.knight@lshtm.ac.uk Getting Started in Digital Preservation The Information Technologists, London 23rd April 2015
  • 2. Case Studies National service that preserved research, teaching and learning resources in arts & humanities between 1996 - 2008 Institutional RDM service that helps LSHTM researchers to curate & preserved research data in public health & tropical medicine
  • 3. Need for Digital Preservation Data Storage media Computing device Operating System Software application Information + + + + = Deteriorate & change over time Obsolete & replaced over time What does this mean? “Digital information lasts forever – or five years, whichever comes first” Jeff Rothenberg, 1997
  • 4. Climb the preservation mountain “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” Neil Beagrie and Maggie Jones (2008) Beagrie & Jones: http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts Caplan: http://journals.ala.org/ltr/article/view/4224/4809 Modified version of Caplan’s Preservation Pyramid Content can be used Content is understandable Content is rendered accurately Bits are stored exactly Its value is recognised & it is acquired Data exists
  • 5. Digital Detectives • Digital preservation often a process of investigation & deduction • Resource intensive – Time – Physical space – Hardware/software costs • How much effort are you willing to make? What is good enough? https://www.flickr.com/photos/ollieolarte/3028314931
  • 6. Acquire data Acquisition depends upon object to be preserved & how stored • Media: Floppy disk, CD/DVD, ZIP/Jaz disk, hard disk, solid state devices, etc. • Electronic: Email, cloud services Invest in infrastructure to support preservation process • Computer hardware • Media readers • 3rd party services can provide advice and hardware rental where needed https://www.flickr.com/photos/adactio/13127134455
  • 7. Case Study: AHDS History dataset Deposited by children of noted researcher in 2006 & processed by GK Documentation: Accompanying notes in researcher’s handwriting described a history DB they were working on in 1988. Challenges: • 5.25" disk drive was available • Disk was failing, but managed to create a complete copy on 5th attempt • Disk analysis revealed text content… The author's short stories, not a dataset! Result: Not accessioned, but children were pleased http://www.old-computers.com/museum/computer.asp?st=1&c=810 History database created on a Shelton Instruments Sig-Net, running CP/M 2.2.operating system in 1988 & saved to 5.25” disk
  • 8. Check completeness What does the creator intend to provide? • Data • Documentation • Research instruments What have they actually provided? • Some data • Creation software & random files • Personal music collection? • Request a file manifest: – Filename – Description – Format https://www.flickr.com/photos/kyngpao/14455832915
  • 9. Case Study: Early English Books Online Collection of 125,000 early printed books deposited for preservation: • XML files, scanned TIFFs & PDFs for each page • Well structured & labelled Problems: • Hard disk was failing • XML output from Content Management system - incomplete header & missing schema • 30% of files referenced in XML were missing Solution: • Obtained schema & missing files (but took a long, long time)
  • 10. Render data Decode file format Reflect tools & software available at point of creation: • Information content • Contextual information (documentation/metadata) Analyse organisation structure Intrinsic relationships important for decoding multi-file objects • Filenames & directory structure Solution • Specialist software may be required to access • Liaise with data creators https://www.flickr.com/photos/hawksanddoves/83818392 How many locks do you have to get through to reach your destination?
  • 11. Case Study: Scientific dataset USB stick of LSHTM dataset containing: • FCS2.0 - tabular data outlining experiments to count cells, sort them & identify biomarkers • Leica Experiment Collection - .lei library file & associated images with embedded metadata Challenges: • Domain & proprietary formats – FITS (file) provides limited info on .lei – FCS not recognised • Complex relationship in Leica experiment - recorded in filename & internal manifest (partial) Solution • Store files as-is • Obtain text output of FCS files • Analyse using open source tools
  • 12. Understand data • 17th-18th Century Enlightenment built on information sharing • Openness & transparency essential for academic research – Evidence of activity – Open to scrutiny & replication • Can you establish who, what, where, when & how? • How much documentation can only be found in the data creator’s head? https://www.flickr.com/photos/domiriel/5234590796
  • 13. Case Study: Adolphe Appia Warwick Uni. School of Theatre Studies modelled performance space of Appia's Festspielhaus at Hellerau. Collection deposited on several CDs: • Digitised photographs of 1991 performance • VRML 3D models of performance space • Videos of 3D models in .mov format • Documentation & Metadata Problem • Image metadata ‘disappeared’ on transfer Solution: • Descriptions added to file attributes, which were being removed when written to disc • Output file attributes to text file • Compressed files and copied to disk© King's Visualisation Lab, King's College London http://www.kvl.cch.k cl.ac.uk/appia.html
  • 14. Final thoughts 1. Analyse your needs & capabilities – What can you do with existing resources? – What future investment is possible? 2. Inform users of your expectations from the outset – File formats – Documentation – File structure & naming conventions – Permissions 3. Help them to fulfil expectations – Advice and guidance http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-curate-41/

Notas del editor

  1. investigation & deduction