SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Melissa Terras, James
Baker, James
Hetherington, David
Beavan, Martin Zaltz
Austwick, Anne Welsh,
Helen O'Neill, Will Finley,
Oliver Duke-Williams, and
Adam Farquhar
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Exceptions: quotations, embeds from external sources, logos, and marked images.
Enabling Complex
Analysis of Large-Scale
Digital Collections
Humanities Research, High Performance
Computing, and transforming access to
British Library Digital Collections Data, code, viz: github.com/UCL-
dataspring
Overview
Barriers to computational approaches:
● fragmentation of communities,
resources, and tools;
● lack of interoperability;
● lack of technical skills
Data, code, viz: github.com/UCL-dataspring
Method
60k books from the British Library:
●
17th
- 19th
century
● 224GB compressed ALTO XML
● UCL High Performance Computing
● 4 humanities researchers
● Research questions to
computational queries
Data, code, viz: github.com/UCL-dataspring
Data, code, viz: github.com/UCL-dataspring
UCL’s Legion Cluster supercomputing facility. Photo: Tony Slade, © UCL Creative Media Services (all rights reserved)
Method
60k books from the British Library:
●
17th
- 19th
century
● 224GB compressed ALTO XML
● UCL High Performance Computing
● 4 humanities researchers
● Research questions to
computational queries
Data, code, viz: github.com/UCL-dataspring
Results
It worked!:
● Case Study 1: History of Medicine
● Case Study 2: History of Images
● Technical barriers
● Search ‘recipes’
Data, code, viz: github.com/UCL-dataspring
Case Study 1
History of Medicine Oliver Duke-Williams, UCL
Data, code, viz: github.com/UCL-dataspring
Case
Study 2
History of
Images
Will Finley,
Sheffield
Data, code, viz: github.com/UCL-dataspring
Case
Study 2
History of
Images
Will Finley,
Sheffield
Data, code, viz: github.com/UCL-dataspring
Technical
Major sticking point:
● Using humanities data on HPCs
Best practice recommendations:
● Derived datasets
● Normalisations
● Documentating decisions
● Fixed/defined dataset
Data, code, viz: github.com/UCL-dataspring
Generic searches:
● for all variants of a word
● that return keywords in context
traced over time
● for a word or phrase that ignore
another word or phrase
● for a word when in close proximity
to word a second word
● based on image metadata
Data, code, viz: github.com/UCL-dataspring
Conclusions
Recommendations for enabling
complex analysis of large-scale digital
collections in the humanities:
● 1 Invest in research software engineer capacity
to deploy and maintain openly licensed large-
scale digital collections from across the GLAM
sector in order to facilitate research in the arts,
humanities and social and historical sciences,
● 2 Invest in training library staff to run these initial
queries in collaboration with humanities faculty,
to support work with subsets of data that are
produced, and to document and manage
resulting code and derived data.
Data, code, viz: github.com/UCL-dataspring
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Exceptions: quotations, embeds from external sources, logos, and marked images.
Special thanks to UCL
Research Computing and
British Library Digital
Research for their hard work
and support!
Data, code, viz: github.com/UCL-
dataspring
Melissa Terras, James
Baker, James
Hetherington, David
Beavan, Martin Zaltz
Austwick, Anne Welsh,
Helen O'Neill, Will Finley,
Oliver Duke-Williams, and
Adam Farquhar

Más contenido relacionado

La actualidad más candente

The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Mahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsMahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsResearchLibrariesUK
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...Nuno Freire
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017Bethan Ruddock
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentConstance Malpas
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?OCLC
 
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...labsbl
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...LIBER Europe
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanitieslabsbl
 
British Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open UniversityBritish Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open Universitylabsbl
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data toolsJisc RDM
 
IIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF_io
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Projectariadnenetwork
 
British Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTBritish Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTlabsbl
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorJulien A. Raemy
 

La actualidad más candente (20)

The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Mahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsMahendra Mahey, British Library Labs
Mahendra Mahey, British Library Labs
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environment
 
Dash UCCSC 2016
Dash UCCSC 2016Dash UCCSC 2016
Dash UCCSC 2016
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...
 
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
 
Edina cigs-21-september-2012
Edina cigs-21-september-2012Edina cigs-21-september-2012
Edina cigs-21-september-2012
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanities
 
British Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open UniversityBritish Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open University
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data tools
 
IIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single Institution
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Project
 
British Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTBritish Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMT
 
Ukla uksg 2013_final
Ukla uksg 2013_finalUkla uksg 2013_final
Ukla uksg 2013_final
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
 

Destacado (15)

Gdz ukrainska mova_bilyaev
Gdz ukrainska mova_bilyaevGdz ukrainska mova_bilyaev
Gdz ukrainska mova_bilyaev
 
Data Fusion Poster
Data Fusion PosterData Fusion Poster
Data Fusion Poster
 
ден на победата
ден на победатаден на победата
ден на победата
 
Uusimmat kohteet turkissa Asunto Alanyasta Turkista
Uusimmat kohteet turkissa Asunto Alanyasta TurkistaUusimmat kohteet turkissa Asunto Alanyasta Turkista
Uusimmat kohteet turkissa Asunto Alanyasta Turkista
 
The Hard Disk as the new Paper Archive
The Hard Disk as the new Paper ArchiveThe Hard Disk as the new Paper Archive
The Hard Disk as the new Paper Archive
 
Museum Ceria's Company Profile
Museum Ceria's Company ProfileMuseum Ceria's Company Profile
Museum Ceria's Company Profile
 
Museum Label for Kids ~ Ajeng
Museum Label for Kids ~ AjengMuseum Label for Kids ~ Ajeng
Museum Label for Kids ~ Ajeng
 
Importance on Conference Call Etiquette
Importance on Conference Call EtiquetteImportance on Conference Call Etiquette
Importance on Conference Call Etiquette
 
[SLIDE FACTORY] [CV slide] Vũ Trà Mi
[SLIDE FACTORY] [CV slide] Vũ Trà Mi[SLIDE FACTORY] [CV slide] Vũ Trà Mi
[SLIDE FACTORY] [CV slide] Vũ Trà Mi
 
Abstencionistas, abstenerse
Abstencionistas, abstenerseAbstencionistas, abstenerse
Abstencionistas, abstenerse
 
Tema 4 1 16
Tema 4 1 16Tema 4 1 16
Tema 4 1 16
 
Tema 5 hegemonía y transmisión de la cultura
Tema 5   hegemonía y transmisión de la cultura   Tema 5   hegemonía y transmisión de la cultura
Tema 5 hegemonía y transmisión de la cultura
 
1b) A2 Media - Language Analysis
1b) A2 Media - Language Analysis1b) A2 Media - Language Analysis
1b) A2 Media - Language Analysis
 
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three BuzzwordsMicroservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
 
Proposal Company Job Fair Depok
Proposal Company Job Fair DepokProposal Company Job Fair Depok
Proposal Company Job Fair Depok
 

Similar a Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections

Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACIDaniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit youUoLResearchSupport
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...Trevor Owens
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryHeinz Pampel
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 

Similar a Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections (20)

Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACI
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit you
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Open science platforms
Open science platformsOpen science platforms
Open science platforms
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Ppt hk pres_final
Ppt hk pres_finalPpt hk pres_final
Ppt hk pres_final
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 

Más de James Baker

1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...James Baker
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experienceJames Baker
 
Decolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentDecolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentJames Baker
 
The Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectThe Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectJames Baker
 
Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007James Baker
 
Forensic Recovery from Data Storage
Forensic Recovery from Data StorageForensic Recovery from Data Storage
Forensic Recovery from Data StorageJames Baker
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experienceJames Baker
 
Who is the Digital Historian?
Who is the Digital Historian?Who is the Digital Historian?
Who is the Digital Historian?James Baker
 
Image Recognition with Pastec
Image Recognition with PastecImage Recognition with Pastec
Image Recognition with PastecJames Baker
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of DataJames Baker
 
Library Carpentry: software skills training for library professionals, Chart...
 Library Carpentry: software skills training for library professionals, Chart... Library Carpentry: software skills training for library professionals, Chart...
Library Carpentry: software skills training for library professionals, Chart...James Baker
 
Hard disks as archives of everyday life
Hard disks as archives of everyday lifeHard disks as archives of everyday life
Hard disks as archives of everyday lifeJames Baker
 
Ditching the Digital
Ditching the DigitalDitching the Digital
Ditching the DigitalJames Baker
 
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...James Baker
 
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...James Baker
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...James Baker
 
Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsJames Baker
 
On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...James Baker
 
Me in three minutes
Me in three minutesMe in three minutes
Me in three minutesJames Baker
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...James Baker
 

Más de James Baker (20)

1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experience
 
Decolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentDecolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-present
 
The Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectThe Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open Project
 
Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007
 
Forensic Recovery from Data Storage
Forensic Recovery from Data StorageForensic Recovery from Data Storage
Forensic Recovery from Data Storage
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experience
 
Who is the Digital Historian?
Who is the Digital Historian?Who is the Digital Historian?
Who is the Digital Historian?
 
Image Recognition with Pastec
Image Recognition with PastecImage Recognition with Pastec
Image Recognition with Pastec
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of Data
 
Library Carpentry: software skills training for library professionals, Chart...
 Library Carpentry: software skills training for library professionals, Chart... Library Carpentry: software skills training for library professionals, Chart...
Library Carpentry: software skills training for library professionals, Chart...
 
Hard disks as archives of everyday life
Hard disks as archives of everyday lifeHard disks as archives of everyday life
Hard disks as archives of everyday life
 
Ditching the Digital
Ditching the DigitalDitching the Digital
Ditching the Digital
 
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
 
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...
 
Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: Basics
 
On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...
 
Me in three minutes
Me in three minutesMe in three minutes
Me in three minutes
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...
 

Último

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Último (20)

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections

  • 1. Melissa Terras, James Baker, James Hetherington, David Beavan, Martin Zaltz Austwick, Anne Welsh, Helen O'Neill, Will Finley, Oliver Duke-Williams, and Adam Farquhar This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: quotations, embeds from external sources, logos, and marked images. Enabling Complex Analysis of Large-Scale Digital Collections Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections Data, code, viz: github.com/UCL- dataspring
  • 2. Overview Barriers to computational approaches: ● fragmentation of communities, resources, and tools; ● lack of interoperability; ● lack of technical skills Data, code, viz: github.com/UCL-dataspring
  • 3. Method 60k books from the British Library: ● 17th - 19th century ● 224GB compressed ALTO XML ● UCL High Performance Computing ● 4 humanities researchers ● Research questions to computational queries Data, code, viz: github.com/UCL-dataspring
  • 4. Data, code, viz: github.com/UCL-dataspring UCL’s Legion Cluster supercomputing facility. Photo: Tony Slade, © UCL Creative Media Services (all rights reserved)
  • 5. Method 60k books from the British Library: ● 17th - 19th century ● 224GB compressed ALTO XML ● UCL High Performance Computing ● 4 humanities researchers ● Research questions to computational queries Data, code, viz: github.com/UCL-dataspring
  • 6. Results It worked!: ● Case Study 1: History of Medicine ● Case Study 2: History of Images ● Technical barriers ● Search ‘recipes’ Data, code, viz: github.com/UCL-dataspring
  • 7. Case Study 1 History of Medicine Oliver Duke-Williams, UCL Data, code, viz: github.com/UCL-dataspring
  • 8. Case Study 2 History of Images Will Finley, Sheffield Data, code, viz: github.com/UCL-dataspring
  • 9. Case Study 2 History of Images Will Finley, Sheffield Data, code, viz: github.com/UCL-dataspring
  • 10. Technical Major sticking point: ● Using humanities data on HPCs Best practice recommendations: ● Derived datasets ● Normalisations ● Documentating decisions ● Fixed/defined dataset Data, code, viz: github.com/UCL-dataspring
  • 11. Generic searches: ● for all variants of a word ● that return keywords in context traced over time ● for a word or phrase that ignore another word or phrase ● for a word when in close proximity to word a second word ● based on image metadata Data, code, viz: github.com/UCL-dataspring
  • 12. Conclusions Recommendations for enabling complex analysis of large-scale digital collections in the humanities: ● 1 Invest in research software engineer capacity to deploy and maintain openly licensed large- scale digital collections from across the GLAM sector in order to facilitate research in the arts, humanities and social and historical sciences, ● 2 Invest in training library staff to run these initial queries in collaboration with humanities faculty, to support work with subsets of data that are produced, and to document and manage resulting code and derived data. Data, code, viz: github.com/UCL-dataspring
  • 13. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: quotations, embeds from external sources, logos, and marked images. Special thanks to UCL Research Computing and British Library Digital Research for their hard work and support! Data, code, viz: github.com/UCL- dataspring Melissa Terras, James Baker, James Hetherington, David Beavan, Martin Zaltz Austwick, Anne Welsh, Helen O'Neill, Will Finley, Oliver Duke-Williams, and Adam Farquhar