SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Flexible Design for Simple Digital Library Tools and
Services
Lighton Phiri Hussein Suleman
Digital Libraries Laboratory
Department of Computer Science
University of Cape Town
October 8, 2013
SARU archaeological database@ UCT
2 of 26
http://www.martinwest.uct.ac.za
3 of 26
http://lloydbleekcollection.cs.uct.ac.za
4 of 26
Contextual overview
Problem and motivation
Preservation costs
Technical skills and education
Computing resources
Proposed solution
Simplicity and minimalism
Successes of minimalism —Project Gutenburg
Principled DL design
Prior work
Derivation of design principles
Repository implementation
Real-world case studies
5 of 26
Repository prototype architectural design
File-based
Digital objects stored on native operating system
Hierarchical collection structure
Metadata objects
Plain text files
Encoded using Dublin Core
Relationships modelled using metadata elements
Object organisation
Metadata records stored alongside objects
Content objects and container objects nested within other container
objects
6 of 26
User study experiment
Objective
Developer-oriented
Simplicity and flexibility of file-based store
Target population
34 Computer Science honours students
12 groups of twos and threes
Skillset
Technologies relevant to study —DBMS, XML, Web apps
Storage solutions
Digital Libraries concepts
Approach
Subjects tasked to build layered services using file-based store
Marks awarded for innovation —among other facets
Subjects answered post-experiment survey
7 of 26
User study experiment - results
Survey participants
76% response rate —representation from all 12 groups
Group Web service Candidates Respondents
Group 1 Transcription 3 3
Group 2 Downloader 3 3
Group 3 Commenting 3 1
Group 4 Visualisation 3 2
Group 5 Transcription 3 2
Group 6 Annotation 2 2
Group 7 Visualisation 3 3
Group 8 Browsing 3 3
Group 9 Annotation 3 2
Group 10 Rating 3 2
Group 11 Gestures 3 1
Group 12 Visualisation 2 2
8 of 26
User study experiment - results (1)
Programming languages usage
C#
Java
Python
HTML5
PHP
JavaScript
0 5 10 15
Number of subjects
Programminglanguages
9 of 26
User study experiment - results (2)
Simplicity
Understandability
Metadata
Structure
Metadata
Structure
0 5 10 15 20 25
Number of subjects
Repositoryaspects
Strongly agree Agree Neutral Disagree Strongly disagree
10 of 26
User study experiment - results (3)
Users asked to rank storage solutions in order of preference
What aspects of your most preferred solution [database] above do
you find particularly valuable?
“I understand databases better.”
“Simple to set up and sheer control”
“Easy setup and connection to MySQL database”
“Ease of data manipulation and relations”
“Centralised management, ease of design, availability of
support/literature”
“The existing infrastructure for storing and retrieving data”
11 of 26
User study experiment - results (4)
Do you have any general comments about the data structure or
format?
“Had some difficulty working the metadata, despite looking at how to
process DC metadata online, it slowed us down considerably.”
“Good structure although confusing that each page has no metadata
of its own(only the story).”
“The hierarchy was not intuitive therefore took a while to understand
however having crossed that hurdle was fairly easy to process.”
“I guess it was OK but took some getting used to”
12 of 26
User study experiment - findings
Simplicity resulted in more understandable structure
69% agreed that XML-files were simple
61% found XML format easy to work with
62% found hierarchical structure simple to work with
46% found hierarchical structure easily understandable
Simplicity does not affect flexibility of interaction with file-store
No influence on choice of language
Only 15% of subjects thought it did
13 of 26
Performance experiment
Objective
Assess performance relative to collection size
Test Environment
Pentium(R) Dual-Core CPU E5200@ 2.50GHz; 4GB RAM
32 bit Ubuntu 12.01 LTS
Siege and ApacheBench for benchmarking
Metrics
Response time
Factors
Collection hierarchical structure
Collection size —digital objects
14 of 26
Performance experiment - test dataset
NDLTD Union Catalog —http://union.ndltd.org/OAI-PMH
Harvested 1 907 000 metadata records
Dublin Core-encoded plain text files
Linearly increasing workload
Workload Objects Cols Size [MB]
W1 100 19 0.54
W2 200 25 1.00
W3 400 42 2.00
...
...
...
...
W13 409 600 128 1945.00
W14 819 200 131 3788.80
W15 1 638 400 131 7680.00
15 of 26
Performance experiment - test dataset (2)
Two datasets spawned from initial dataset
one-, two- and three-level structures
NDLTD
OCLC
...
object
...
...
...
...
(a) Dataset #1
NDLTD
OCLC
2010
...
object
...
...
...
(b) Dataset #2
NDLTD
OCLC
2010
z
...
object
...
...
(c) Dataset #3
16 of 26
Performance experiment - evaluation aspects
Transaction log analysis —http://pubs.cs.uct.ac.za
Ingestion
Full-text search
Indexing operations
OAI-PMH data provider
Feed generation
17 of 26
Performance experiment - experimental design
Performance benchmarking
Evaluation aspects
Three-run averages for all scenarios
Datasets #1, #2 and #3
15 workloads
Break-even points for performance degradation
Nielsen’s three important limits for response times
Performance comparisons
Benchmark results vs DSpace 3.1
Ingestion
Full-text search
OAI-PMH data provider
18 of 26
Performance experiment - results
Item ingestion
2.5
2.6
2.7
2.8
2.9
3.0 100
200
400
800
1.6k
3.2k
6.4k
12.8k
25.6k
51.2k
102.4k
204.8k
409.6k
819.2k
1638.4k
Workload size
Time[ms]
Dataset #1 Dataset #2 Dataset #3
19 of 26
Performance experiment - results (2)
Full-text search
100
105
1010
1015
100
200
400
800
1.6k
3.2k
6.4k
12.8k
25.6k
51.2k
102.4k
204.8k
409.6k
819.2k
1638.4k
Workload size
log10(Time[ms])
Traversal time Parsing time XPath time
20 of 26
Performance experiment - results (3)
OAI-PMH data provider
10-2
10-1
100
101
102
103 100
200
400
800
1.6k
3.2k
6.4k
12.8k
25.6k
51.2k
102.4k
204.8k
409.6k
819.2k
1638.4k
Workload size
log10(Time[ms])
GetRecord ListIdentifiers ListRecords ListSets
21 of 26
Performance experiment - results (4)
Index
Ingest
OAI-PMH
Feed
Search
100
102
104
106
log10(Time[ms])
100
200
400
800
1.6k
3.2k
6.4k
12.8k
25.6k
51.2k
102.4k
204.8k
409.6k
819.2k
1638.4k
22 of 26
Performance experiment - findings
Performance benchmarking
Performance within ’acceptable’ limits for medium-sized collections
Ingestion performance NOT affected by collection scale
Performance generally degrades for collections > 12 800 objects
Performance degradation adversely affects information-discovery
services —Feed generation, full-text search and OAI-PMH data
provider
Comparison with DSpace 3.1
Ingestion performance better than DSpace
Information discovery operation —search and OAI-PMH— are slower
than DSpace
DSpace uses Apache Solr for index
Comparable speeds can be attained through integration with third-party
search services
23 of 26
Conclusions and future work
Conclusions
Feasibility of simple DL architectures
Simplicity does not affect flexibility and potential extensibility of result
tools and services
Performance acceptable for small- and medium-sized collections
Comparable features with well-established solutions
Reference implementation
Packaging
Version control
24 of 26
Thank You
Questions?
Additional Information
http://dl.cs.uct.ac.za

Más contenido relacionado

La actualidad más candente

Nature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic WebNature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic WebStefan Ceriu
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl systemijdms
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a reviewTELKOMNIKA JOURNAL
 
Cold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approachCold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approachieeechennai
 

La actualidad más candente (7)

Nature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic WebNature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic Web
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl system
 
A new link based approach for categorical data clustering
A new link based approach for categorical data clusteringA new link based approach for categorical data clustering
A new link based approach for categorical data clustering
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a review
 
Cold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approachCold start recommendation with provable guarantees a decoupled approach
Cold start recommendation with provable guarantees a decoupled approach
 
Ijcatr04071003
Ijcatr04071003Ijcatr04071003
Ijcatr04071003
 

Destacado

Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Frederick Zarndt
 
Noblessner / Estonian Design Awards 2014
Noblessner / Estonian Design Awards 2014Noblessner / Estonian Design Awards 2014
Noblessner / Estonian Design Awards 2014Designawards
 
Digar - Digital archive of the National Library / Estonian Design Awards 2014
Digar - Digital archive of the National Library / Estonian Design Awards 2014Digar - Digital archive of the National Library / Estonian Design Awards 2014
Digar - Digital archive of the National Library / Estonian Design Awards 2014Designawards
 
RDFa-deployed Multimedia Metadata - Tutorial, Part 2
RDFa-deployed Multimedia Metadata - Tutorial, Part 2RDFa-deployed Multimedia Metadata - Tutorial, Part 2
RDFa-deployed Multimedia Metadata - Tutorial, Part 2Michael Hausenblas
 
IAML Antwerp 2014 From historical collections to metadata
IAML Antwerp 2014 From historical collections to metadataIAML Antwerp 2014 From historical collections to metadata
IAML Antwerp 2014 From historical collections to metadataKaren McAulay
 
DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...
DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...
DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...Khalid Md Saifuddin
 
Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projectsac2182
 

Destacado (11)

Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
 
Noblessner / Estonian Design Awards 2014
Noblessner / Estonian Design Awards 2014Noblessner / Estonian Design Awards 2014
Noblessner / Estonian Design Awards 2014
 
Digar - Digital archive of the National Library / Estonian Design Awards 2014
Digar - Digital archive of the National Library / Estonian Design Awards 2014Digar - Digital archive of the National Library / Estonian Design Awards 2014
Digar - Digital archive of the National Library / Estonian Design Awards 2014
 
LBPL Digital Archive
LBPL Digital ArchiveLBPL Digital Archive
LBPL Digital Archive
 
RDFa-deployed Multimedia Metadata - Tutorial, Part 2
RDFa-deployed Multimedia Metadata - Tutorial, Part 2RDFa-deployed Multimedia Metadata - Tutorial, Part 2
RDFa-deployed Multimedia Metadata - Tutorial, Part 2
 
IAML Antwerp 2014 From historical collections to metadata
IAML Antwerp 2014 From historical collections to metadataIAML Antwerp 2014 From historical collections to metadata
IAML Antwerp 2014 From historical collections to metadata
 
DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...
DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...
DESIGN AND IMPLEMENTATION OF A DIGITAL ARCHIVE OF LEARNING OBJECTS FOR REMOTE...
 
Management of library and archive in digital era
Management of library and archive in digital eraManagement of library and archive in digital era
Management of library and archive in digital era
 
Making a Case for Photo Metadata
Making a Case for Photo MetadataMaking a Case for Photo Metadata
Making a Case for Photo Metadata
 
Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projects
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 

Similar a Flexible Design for Simple Digital Library Tools and Services

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
SAICSIT 2011 Poster Abstract
SAICSIT 2011 Poster AbstractSAICSIT 2011 Poster Abstract
SAICSIT 2011 Poster AbstractLighton Phiri
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdfpaijitk
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
Research data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaSResearch data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaSJisc RDM
 
Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...
Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...
Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...muazkhan7253
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesManjulaPatel
 
M.tech cse 10july13 (1)
M.tech cse  10july13 (1)M.tech cse  10july13 (1)
M.tech cse 10july13 (1)vijay707070
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!dclsocialmedia
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through PrefetchingRohit Deshpande
 
Data base and data warehouse
Data base and data warehouse Data base and data warehouse
Data base and data warehouse MudduSahanagowda
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 

Similar a Flexible Design for Simple Digital Library Tools and Services (20)

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
SAICSIT 2011 Poster Abstract
SAICSIT 2011 Poster AbstractSAICSIT 2011 Poster Abstract
SAICSIT 2011 Poster Abstract
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Syllabus.pdf
Syllabus.pdfSyllabus.pdf
Syllabus.pdf
 
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
 
Research data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaSResearch data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaS
 
Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...
Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...
Evaluation Scheme & Detailed Syllabus of Information Technology & CSI 3rd Yea...
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
M.tech cse 10july13 (1)
M.tech cse  10july13 (1)M.tech cse  10july13 (1)
M.tech cse 10july13 (1)
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through Prefetching
 
Data base and data warehouse
Data base and data warehouse Data base and data warehouse
Data base and data warehouse
 
Cal Essay
Cal EssayCal Essay
Cal Essay
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 

Más de Lighton Phiri

Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...
Enterprise Medical Imaging for Streamlined Radiological Diagnosis  in Zambian...Enterprise Medical Imaging for Streamlined Radiological Diagnosis  in Zambian...
Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...Lighton Phiri
 
User Centred Design and Implementation of Useful Picture Archiving and Commun...
User Centred Design and Implementation of Useful Picture Archiving and Commun...User Centred Design and Implementation of Useful Picture Archiving and Commun...
User Centred Design and Implementation of Useful Picture Archiving and Commun...Lighton Phiri
 
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...Lighton Phiri
 
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...Lighton Phiri
 
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...Lighton Phiri
 
Enterprise Medical Imaging in the Global South: Challenges and Opportunities
Enterprise Medical Imaging in the Global South: Challenges and OpportunitiesEnterprise Medical Imaging in the Global South: Challenges and Opportunities
Enterprise Medical Imaging in the Global South: Challenges and OpportunitiesLighton Phiri
 
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...Lighton Phiri
 
Discovering Insight from Scholarly Research Output in Higher Educational Inst...
Discovering Insight from Scholarly Research Output in Higher Educational Inst...Discovering Insight from Scholarly Research Output in Higher Educational Inst...
Discovering Insight from Scholarly Research Output in Higher Educational Inst...Lighton Phiri
 
DRGS OJS Training: Electronic Publishing Using Open Journal Systems
DRGS OJS Training: Electronic Publishing Using Open Journal SystemsDRGS OJS Training: Electronic Publishing Using Open Journal Systems
DRGS OJS Training: Electronic Publishing Using Open Journal SystemsLighton Phiri
 
OJS Training: Users and User Roles
OJS Training: Users and User RolesOJS Training: Users and User Roles
OJS Training: Users and User RolesLighton Phiri
 
OJS Training: Journal Settings and Configuration
OJS Training: Journal Settings and ConfigurationOJS Training: Journal Settings and Configuration
OJS Training: Journal Settings and ConfigurationLighton Phiri
 
OJS Training: Managing The Submission Process
OJS Training: Managing The Submission ProcessOJS Training: Managing The Submission Process
OJS Training: Managing The Submission ProcessLighton Phiri
 
OJS Training: Creating and Managing Journal Issues
OJS Training: Creating and Managing Journal IssuesOJS Training: Creating and Managing Journal Issues
OJS Training: Creating and Managing Journal IssuesLighton Phiri
 
Improved Discoverability of Digital Objects in Institutional Repositories Usi...
Improved Discoverability of Digital Objects in Institutional Repositories Usi...Improved Discoverability of Digital Objects in Institutional Repositories Usi...
Improved Discoverability of Digital Objects in Institutional Repositories Usi...Lighton Phiri
 
Using Machine Learning Techniques for Solving Locally Relevant Problems
Using Machine Learning Techniques for Solving Locally Relevant ProblemsUsing Machine Learning Techniques for Solving Locally Relevant Problems
Using Machine Learning Techniques for Solving Locally Relevant ProblemsLighton Phiri
 
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...Lighton Phiri
 
Institutional Repository Single Sources of Truth
Institutional Repository Single Sources of TruthInstitutional Repository Single Sources of Truth
Institutional Repository Single Sources of TruthLighton Phiri
 
Improved Scholarly Communication Using Machine Learning
Improved Scholarly Communication Using Machine LearningImproved Scholarly Communication Using Machine Learning
Improved Scholarly Communication Using Machine LearningLighton Phiri
 
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...Lighton Phiri
 
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...Lighton Phiri
 

Más de Lighton Phiri (20)

Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...
Enterprise Medical Imaging for Streamlined Radiological Diagnosis  in Zambian...Enterprise Medical Imaging for Streamlined Radiological Diagnosis  in Zambian...
Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...
 
User Centred Design and Implementation of Useful Picture Archiving and Commun...
User Centred Design and Implementation of Useful Picture Archiving and Commun...User Centred Design and Implementation of Useful Picture Archiving and Commun...
User Centred Design and Implementation of Useful Picture Archiving and Commun...
 
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
 
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
 
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
 
Enterprise Medical Imaging in the Global South: Challenges and Opportunities
Enterprise Medical Imaging in the Global South: Challenges and OpportunitiesEnterprise Medical Imaging in the Global South: Challenges and Opportunities
Enterprise Medical Imaging in the Global South: Challenges and Opportunities
 
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
 
Discovering Insight from Scholarly Research Output in Higher Educational Inst...
Discovering Insight from Scholarly Research Output in Higher Educational Inst...Discovering Insight from Scholarly Research Output in Higher Educational Inst...
Discovering Insight from Scholarly Research Output in Higher Educational Inst...
 
DRGS OJS Training: Electronic Publishing Using Open Journal Systems
DRGS OJS Training: Electronic Publishing Using Open Journal SystemsDRGS OJS Training: Electronic Publishing Using Open Journal Systems
DRGS OJS Training: Electronic Publishing Using Open Journal Systems
 
OJS Training: Users and User Roles
OJS Training: Users and User RolesOJS Training: Users and User Roles
OJS Training: Users and User Roles
 
OJS Training: Journal Settings and Configuration
OJS Training: Journal Settings and ConfigurationOJS Training: Journal Settings and Configuration
OJS Training: Journal Settings and Configuration
 
OJS Training: Managing The Submission Process
OJS Training: Managing The Submission ProcessOJS Training: Managing The Submission Process
OJS Training: Managing The Submission Process
 
OJS Training: Creating and Managing Journal Issues
OJS Training: Creating and Managing Journal IssuesOJS Training: Creating and Managing Journal Issues
OJS Training: Creating and Managing Journal Issues
 
Improved Discoverability of Digital Objects in Institutional Repositories Usi...
Improved Discoverability of Digital Objects in Institutional Repositories Usi...Improved Discoverability of Digital Objects in Institutional Repositories Usi...
Improved Discoverability of Digital Objects in Institutional Repositories Usi...
 
Using Machine Learning Techniques for Solving Locally Relevant Problems
Using Machine Learning Techniques for Solving Locally Relevant ProblemsUsing Machine Learning Techniques for Solving Locally Relevant Problems
Using Machine Learning Techniques for Solving Locally Relevant Problems
 
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
 
Institutional Repository Single Sources of Truth
Institutional Repository Single Sources of TruthInstitutional Repository Single Sources of Truth
Institutional Repository Single Sources of Truth
 
Improved Scholarly Communication Using Machine Learning
Improved Scholarly Communication Using Machine LearningImproved Scholarly Communication Using Machine Learning
Improved Scholarly Communication Using Machine Learning
 
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
 
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
 

Último

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 

Último (20)

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 

Flexible Design for Simple Digital Library Tools and Services

  • 1. Flexible Design for Simple Digital Library Tools and Services Lighton Phiri Hussein Suleman Digital Libraries Laboratory Department of Computer Science University of Cape Town October 8, 2013
  • 5. Contextual overview Problem and motivation Preservation costs Technical skills and education Computing resources Proposed solution Simplicity and minimalism Successes of minimalism —Project Gutenburg Principled DL design Prior work Derivation of design principles Repository implementation Real-world case studies 5 of 26
  • 6. Repository prototype architectural design File-based Digital objects stored on native operating system Hierarchical collection structure Metadata objects Plain text files Encoded using Dublin Core Relationships modelled using metadata elements Object organisation Metadata records stored alongside objects Content objects and container objects nested within other container objects 6 of 26
  • 7. User study experiment Objective Developer-oriented Simplicity and flexibility of file-based store Target population 34 Computer Science honours students 12 groups of twos and threes Skillset Technologies relevant to study —DBMS, XML, Web apps Storage solutions Digital Libraries concepts Approach Subjects tasked to build layered services using file-based store Marks awarded for innovation —among other facets Subjects answered post-experiment survey 7 of 26
  • 8. User study experiment - results Survey participants 76% response rate —representation from all 12 groups Group Web service Candidates Respondents Group 1 Transcription 3 3 Group 2 Downloader 3 3 Group 3 Commenting 3 1 Group 4 Visualisation 3 2 Group 5 Transcription 3 2 Group 6 Annotation 2 2 Group 7 Visualisation 3 3 Group 8 Browsing 3 3 Group 9 Annotation 3 2 Group 10 Rating 3 2 Group 11 Gestures 3 1 Group 12 Visualisation 2 2 8 of 26
  • 9. User study experiment - results (1) Programming languages usage C# Java Python HTML5 PHP JavaScript 0 5 10 15 Number of subjects Programminglanguages 9 of 26
  • 10. User study experiment - results (2) Simplicity Understandability Metadata Structure Metadata Structure 0 5 10 15 20 25 Number of subjects Repositoryaspects Strongly agree Agree Neutral Disagree Strongly disagree 10 of 26
  • 11. User study experiment - results (3) Users asked to rank storage solutions in order of preference What aspects of your most preferred solution [database] above do you find particularly valuable? “I understand databases better.” “Simple to set up and sheer control” “Easy setup and connection to MySQL database” “Ease of data manipulation and relations” “Centralised management, ease of design, availability of support/literature” “The existing infrastructure for storing and retrieving data” 11 of 26
  • 12. User study experiment - results (4) Do you have any general comments about the data structure or format? “Had some difficulty working the metadata, despite looking at how to process DC metadata online, it slowed us down considerably.” “Good structure although confusing that each page has no metadata of its own(only the story).” “The hierarchy was not intuitive therefore took a while to understand however having crossed that hurdle was fairly easy to process.” “I guess it was OK but took some getting used to” 12 of 26
  • 13. User study experiment - findings Simplicity resulted in more understandable structure 69% agreed that XML-files were simple 61% found XML format easy to work with 62% found hierarchical structure simple to work with 46% found hierarchical structure easily understandable Simplicity does not affect flexibility of interaction with file-store No influence on choice of language Only 15% of subjects thought it did 13 of 26
  • 14. Performance experiment Objective Assess performance relative to collection size Test Environment Pentium(R) Dual-Core CPU E5200@ 2.50GHz; 4GB RAM 32 bit Ubuntu 12.01 LTS Siege and ApacheBench for benchmarking Metrics Response time Factors Collection hierarchical structure Collection size —digital objects 14 of 26
  • 15. Performance experiment - test dataset NDLTD Union Catalog —http://union.ndltd.org/OAI-PMH Harvested 1 907 000 metadata records Dublin Core-encoded plain text files Linearly increasing workload Workload Objects Cols Size [MB] W1 100 19 0.54 W2 200 25 1.00 W3 400 42 2.00 ... ... ... ... W13 409 600 128 1945.00 W14 819 200 131 3788.80 W15 1 638 400 131 7680.00 15 of 26
  • 16. Performance experiment - test dataset (2) Two datasets spawned from initial dataset one-, two- and three-level structures NDLTD OCLC ... object ... ... ... ... (a) Dataset #1 NDLTD OCLC 2010 ... object ... ... ... (b) Dataset #2 NDLTD OCLC 2010 z ... object ... ... (c) Dataset #3 16 of 26
  • 17. Performance experiment - evaluation aspects Transaction log analysis —http://pubs.cs.uct.ac.za Ingestion Full-text search Indexing operations OAI-PMH data provider Feed generation 17 of 26
  • 18. Performance experiment - experimental design Performance benchmarking Evaluation aspects Three-run averages for all scenarios Datasets #1, #2 and #3 15 workloads Break-even points for performance degradation Nielsen’s three important limits for response times Performance comparisons Benchmark results vs DSpace 3.1 Ingestion Full-text search OAI-PMH data provider 18 of 26
  • 19. Performance experiment - results Item ingestion 2.5 2.6 2.7 2.8 2.9 3.0 100 200 400 800 1.6k 3.2k 6.4k 12.8k 25.6k 51.2k 102.4k 204.8k 409.6k 819.2k 1638.4k Workload size Time[ms] Dataset #1 Dataset #2 Dataset #3 19 of 26
  • 20. Performance experiment - results (2) Full-text search 100 105 1010 1015 100 200 400 800 1.6k 3.2k 6.4k 12.8k 25.6k 51.2k 102.4k 204.8k 409.6k 819.2k 1638.4k Workload size log10(Time[ms]) Traversal time Parsing time XPath time 20 of 26
  • 21. Performance experiment - results (3) OAI-PMH data provider 10-2 10-1 100 101 102 103 100 200 400 800 1.6k 3.2k 6.4k 12.8k 25.6k 51.2k 102.4k 204.8k 409.6k 819.2k 1638.4k Workload size log10(Time[ms]) GetRecord ListIdentifiers ListRecords ListSets 21 of 26
  • 22. Performance experiment - results (4) Index Ingest OAI-PMH Feed Search 100 102 104 106 log10(Time[ms]) 100 200 400 800 1.6k 3.2k 6.4k 12.8k 25.6k 51.2k 102.4k 204.8k 409.6k 819.2k 1638.4k 22 of 26
  • 23. Performance experiment - findings Performance benchmarking Performance within ’acceptable’ limits for medium-sized collections Ingestion performance NOT affected by collection scale Performance generally degrades for collections > 12 800 objects Performance degradation adversely affects information-discovery services —Feed generation, full-text search and OAI-PMH data provider Comparison with DSpace 3.1 Ingestion performance better than DSpace Information discovery operation —search and OAI-PMH— are slower than DSpace DSpace uses Apache Solr for index Comparable speeds can be attained through integration with third-party search services 23 of 26
  • 24. Conclusions and future work Conclusions Feasibility of simple DL architectures Simplicity does not affect flexibility and potential extensibility of result tools and services Performance acceptable for small- and medium-sized collections Comparable features with well-established solutions Reference implementation Packaging Version control 24 of 26