SlideShare a Scribd company logo
Goobi at the Bodleian
BACKGROUND AND WORK SO FAR
Background
oExisting long-running and very experienced digitisation studio.
oPrimarily low-volume, very high-quality work. Special collections material.
oSome project-funded larger scale projects, but not in the recent past.
Existing systems
A mixture of bespoke applications, and a diverse mix of technologies:
•MySQL
•MS Access
•VBA
•Perl
•PHP
•Python
•Windows batch files
•Imagemagick
•Shell scripts / cron
‘Systems’ limitations
Physical hardware nearing end of lifetime.
Physical hardware performance inadequate for existing production volume.
Network limitations.
Commercially supported software at or past end of lifetime.
Bespoke or locally developed software past end of lifetime, and not suitable for incremental
upgrade and revision.
Lack of in-house resources to build a completely new workflow system from scratch.
Poor or non-existent documentation.
Project work and ‘mass’ digitisation
Newly funded major digitisation projects:
•Polonsky foundation: 500,000 images (3 years) – Greek & Hebrew manuscripts and incunabula.
•Chinese: 1,000,000 images.
Need to substantially increase production, while maintaining quality.
Existing systems already inadequate for current production levels.
Solution
Software workflow:
◦ Goobi – phased introduction. Phase 1: ‘large’ projects only, Phase 2: smaller commercial orders.
New hardware infrastructure:
◦ Dedicated server cluster (virtualised)
◦ Upgraded network infrastructure
◦ Custom built from the ground-up to support high-volume digitisation.
Repository:
◦ ‘Databank’
Delivery:
◦ Digital.Bodleian
◦ Viewer.Bodleian
Current State of Play
Software workflow:
◦ Goobi – Entering final testing phase, prior to roll-out.
New hardware infrastructure:
◦ Dedicated server cluster (virtualised on dedicated hardware) – In build and test.
◦ Upgraded network infrastructure – Nov. 2014 [move to a new building]
◦ Custom built from the ground-up to support high-volume digitisation.
Repository:
◦ ‘Databank’ – In production.
Delivery:
◦ Digital.Bodleian – ‘Soft’ launch, not in full public launch.
◦ Viewer.Bodleian – In production. Version 1.
Goobi workflow (1)
Create process
Insert UUID and export path [as process properties]
Order and check physical item
Photography
TIFF verification [JHOVE2]
Jpeg generation
Jpeg verification [JHOVE2]
QA
Jpeg2000 creation [Kakadu + Python]
Goobi workflow (2)
Jpeg2000 verification [JHOVE2]
Metadata entry
Metadata QA
Export to DMS
UUID generation [for page/image level records]
Generate derivative metadata [Dublin Core, IIIF]
Extract EXIF/XMP technical metadata [Exempi / Python]
Send to queue/workers for upload to repository [RabbitMQ, Databank]
Problems / Lessons learned
Metadata ‘ruleset’:
•Difficulties getting consensus from disparate groups of stake-holders, e.g. curators, and technical specialists.
•Information gathering / consultation time-consuming, and returns poor.
Systems integration:
•Difficulties integrating with elements of our own systems where no ‘out-of-the-box’ or standard solutions exist.
Systems performance:
•Networking bandwidth
•Server loads
•Working storage for ‘in-flight’ data.
•Efficient ‘pipe’ to final repository.
Ongoing problems / work remaining
Goobi only replaces part of our existing workflow.
Further development needed to integrate with on-line ordering, order/customer tracking, and
billing systems.
Further development needed to integrate with secure delivery mechanisms for commercial
orders.
Possible integration with other library systems and resources.

More Related Content

Similar to Goobi at the bodleian

Automatize everything
Automatize everythingAutomatize everything
Automatize everythingBoris Bucha
 
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.comFilipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.comZabbix
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2Docker, Inc.
 
Cincom Smalltalk Roadmap 2010
Cincom Smalltalk Roadmap 2010Cincom Smalltalk Roadmap 2010
Cincom Smalltalk Roadmap 2010ESUG
 
Symfony2 for legacy app rejuvenation: the eZ Publish case study
Symfony2 for legacy app rejuvenation: the eZ Publish case studySymfony2 for legacy app rejuvenation: the eZ Publish case study
Symfony2 for legacy app rejuvenation: the eZ Publish case studyGaetano Giunta
 
Ultime Novità di Prodotto Neo4j
Ultime Novità di Prodotto Neo4j Ultime Novità di Prodotto Neo4j
Ultime Novità di Prodotto Neo4j Neo4j
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
Web QA Gaia/B2G/Firefox OS front-end automation
Web QA Gaia/B2G/Firefox OS front-end automationWeb QA Gaia/B2G/Firefox OS front-end automation
Web QA Gaia/B2G/Firefox OS front-end automationStephen Donner
 
Portable infrastructure with puppet
Portable infrastructure with puppetPortable infrastructure with puppet
Portable infrastructure with puppetlkanies
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERNGavin McCance
 
State of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DCState of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DCPuppet
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneDashlane
 
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014datafundamentals
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsNETWAYS
 
MoldCamp - multidimentional testing workflow. CIBox.
MoldCamp  - multidimentional testing workflow. CIBox.MoldCamp  - multidimentional testing workflow. CIBox.
MoldCamp - multidimentional testing workflow. CIBox.Andrii Podanenko
 
Cost-effective e-Government Services: Export Control System phase 2 (ECS2)
Cost-effective e-Government Services: Export Control System phase 2 (ECS2)Cost-effective e-Government Services: Export Control System phase 2 (ECS2)
Cost-effective e-Government Services: Export Control System phase 2 (ECS2)Vladimir Alexiev, PhD, PMP
 
DEVNET-1112 The DevNet Hackathon Awards
DEVNET-1112	The DevNet Hackathon AwardsDEVNET-1112	The DevNet Hackathon Awards
DEVNET-1112 The DevNet Hackathon AwardsCisco DevNet
 
PHP Unconference Continuous Integration
PHP Unconference Continuous IntegrationPHP Unconference Continuous Integration
PHP Unconference Continuous IntegrationNils Hofmeister
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceGrid Dynamics
 
Continuous Delivery at Wix
Continuous Delivery at WixContinuous Delivery at Wix
Continuous Delivery at WixYoav Avrahami
 

Similar to Goobi at the bodleian (20)

Automatize everything
Automatize everythingAutomatize everything
Automatize everything
 
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.comFilipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2
 
Cincom Smalltalk Roadmap 2010
Cincom Smalltalk Roadmap 2010Cincom Smalltalk Roadmap 2010
Cincom Smalltalk Roadmap 2010
 
Symfony2 for legacy app rejuvenation: the eZ Publish case study
Symfony2 for legacy app rejuvenation: the eZ Publish case studySymfony2 for legacy app rejuvenation: the eZ Publish case study
Symfony2 for legacy app rejuvenation: the eZ Publish case study
 
Ultime Novità di Prodotto Neo4j
Ultime Novità di Prodotto Neo4j Ultime Novità di Prodotto Neo4j
Ultime Novità di Prodotto Neo4j
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
Web QA Gaia/B2G/Firefox OS front-end automation
Web QA Gaia/B2G/Firefox OS front-end automationWeb QA Gaia/B2G/Firefox OS front-end automation
Web QA Gaia/B2G/Firefox OS front-end automation
 
Portable infrastructure with puppet
Portable infrastructure with puppetPortable infrastructure with puppet
Portable infrastructure with puppet
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
State of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DCState of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DC
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at Dashlane
 
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014Hadoop Demystified + Automation Smackdown!  Austin JUG June 24 2014
Hadoop Demystified + Automation Smackdown! Austin JUG June 24 2014
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph Luchs
 
MoldCamp - multidimentional testing workflow. CIBox.
MoldCamp  - multidimentional testing workflow. CIBox.MoldCamp  - multidimentional testing workflow. CIBox.
MoldCamp - multidimentional testing workflow. CIBox.
 
Cost-effective e-Government Services: Export Control System phase 2 (ECS2)
Cost-effective e-Government Services: Export Control System phase 2 (ECS2)Cost-effective e-Government Services: Export Control System phase 2 (ECS2)
Cost-effective e-Government Services: Export Control System phase 2 (ECS2)
 
DEVNET-1112 The DevNet Hackathon Awards
DEVNET-1112	The DevNet Hackathon AwardsDEVNET-1112	The DevNet Hackathon Awards
DEVNET-1112 The DevNet Hackathon Awards
 
PHP Unconference Continuous Integration
PHP Unconference Continuous IntegrationPHP Unconference Continuous Integration
PHP Unconference Continuous Integration
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
 
Continuous Delivery at Wix
Continuous Delivery at WixContinuous Delivery at Wix
Continuous Delivery at Wix
 

Recently uploaded

Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...Nguyen Thanh Tu Collection
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfjoachimlavalley1
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...Nguyen Thanh Tu Collection
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringDenish Jangid
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXMIRIAMSALINAS13
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxShibin Azad
 
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...Abhinav Gaur Kaptaan
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxricssacare
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17Celine George
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resourcesdimpy50
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxJenilouCasareno
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfbu07226
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17Celine George
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 

Recently uploaded (20)

NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resources
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 

Goobi at the bodleian

  • 1. Goobi at the Bodleian BACKGROUND AND WORK SO FAR
  • 2. Background oExisting long-running and very experienced digitisation studio. oPrimarily low-volume, very high-quality work. Special collections material. oSome project-funded larger scale projects, but not in the recent past.
  • 3. Existing systems A mixture of bespoke applications, and a diverse mix of technologies: •MySQL •MS Access •VBA •Perl •PHP •Python •Windows batch files •Imagemagick •Shell scripts / cron
  • 4. ‘Systems’ limitations Physical hardware nearing end of lifetime. Physical hardware performance inadequate for existing production volume. Network limitations. Commercially supported software at or past end of lifetime. Bespoke or locally developed software past end of lifetime, and not suitable for incremental upgrade and revision. Lack of in-house resources to build a completely new workflow system from scratch. Poor or non-existent documentation.
  • 5. Project work and ‘mass’ digitisation Newly funded major digitisation projects: •Polonsky foundation: 500,000 images (3 years) – Greek & Hebrew manuscripts and incunabula. •Chinese: 1,000,000 images. Need to substantially increase production, while maintaining quality. Existing systems already inadequate for current production levels.
  • 6. Solution Software workflow: ◦ Goobi – phased introduction. Phase 1: ‘large’ projects only, Phase 2: smaller commercial orders. New hardware infrastructure: ◦ Dedicated server cluster (virtualised) ◦ Upgraded network infrastructure ◦ Custom built from the ground-up to support high-volume digitisation. Repository: ◦ ‘Databank’ Delivery: ◦ Digital.Bodleian ◦ Viewer.Bodleian
  • 7. Current State of Play Software workflow: ◦ Goobi – Entering final testing phase, prior to roll-out. New hardware infrastructure: ◦ Dedicated server cluster (virtualised on dedicated hardware) – In build and test. ◦ Upgraded network infrastructure – Nov. 2014 [move to a new building] ◦ Custom built from the ground-up to support high-volume digitisation. Repository: ◦ ‘Databank’ – In production. Delivery: ◦ Digital.Bodleian – ‘Soft’ launch, not in full public launch. ◦ Viewer.Bodleian – In production. Version 1.
  • 8. Goobi workflow (1) Create process Insert UUID and export path [as process properties] Order and check physical item Photography TIFF verification [JHOVE2] Jpeg generation Jpeg verification [JHOVE2] QA Jpeg2000 creation [Kakadu + Python]
  • 9. Goobi workflow (2) Jpeg2000 verification [JHOVE2] Metadata entry Metadata QA Export to DMS UUID generation [for page/image level records] Generate derivative metadata [Dublin Core, IIIF] Extract EXIF/XMP technical metadata [Exempi / Python] Send to queue/workers for upload to repository [RabbitMQ, Databank]
  • 10. Problems / Lessons learned Metadata ‘ruleset’: •Difficulties getting consensus from disparate groups of stake-holders, e.g. curators, and technical specialists. •Information gathering / consultation time-consuming, and returns poor. Systems integration: •Difficulties integrating with elements of our own systems where no ‘out-of-the-box’ or standard solutions exist. Systems performance: •Networking bandwidth •Server loads •Working storage for ‘in-flight’ data. •Efficient ‘pipe’ to final repository.
  • 11. Ongoing problems / work remaining Goobi only replaces part of our existing workflow. Further development needed to integrate with on-line ordering, order/customer tracking, and billing systems. Further development needed to integrate with secure delivery mechanisms for commercial orders. Possible integration with other library systems and resources.