SlideShare una empresa de Scribd logo
1 de 22
Implementing a funder repository
     with heterogeneous material
and advanced presentation capabilities

 Ioanna-Ourania Stathopoulou, Nikos Houssos, Panagiotis Stathopoulos,
              Despina Hardouveli, Alexandra Roubani,
            Alexandros Soumplis, Chrysostomos Nanakos

 National Documentation Centre / National Hellenic Research Foundation, Greece




        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Agenda
•   Aims and scope of the project
•   Methodology and approach
•   Implementation / DSpace extensions
•   Conclusions – future work




         7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Project context
• Hellenic Ministry of Education programmes for
  education and lifelong learning, co-funded by the
  EU structural funds
   – Three phases: 1994-1999, 2000-2006, 2007-2013
   – Overall funds in the range of billion Euros
• Activities funded:
   –   Face-to-face courses / seminars
   –   Creation of educational material
   –   Studies, surveys, reports
   –   Conferences, workshops
   –   Research: theses, collaborative projects

           7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Aims and scope of the project
• Decision by funding authority to create a
  repository (autumn 2010)
• Aims: Dissemination, reuse and preservation of the
  digital material produced in the frame of the
  funded programmes
• Single point of (persistent) access to material,
  standards-based documentation/metadata,
  interoperability, link of content to project / funding
  programme
• Repository available (since spring 2011)
   – http://repository.edulll.gr

          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Project inputs and outputs
• Input material:
  – Digital files arranged in folders – (hopefully) according
    to project
  – List of projects
  – No metadata!
• Output aimed for:
  – Repository with standards compliant metadata records
    along with processed digital files, suitable for
    dissemination
  – User-friendly interface for human users
  – Programmatic interface (OAI-PMH) for reuse by third-
    party software applications
         7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Methodology and approach
• Investigate input material
   – Identify content of archival value – exclude all other material
     (e.g. administrative documents)
   – Assign priorities to content
• Documentation / cataloguing of material according to
  standards – application profile created
   – Subject cataloguing
   – No self-archiving at this phase
• Digital processing of input files to produce copies
  suitable for preservation and dissemination – both initial
  and processed files kept in repository
• A lot of effort allocated to user-friendly interfaces for
  browsing metadata and accessing content
          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Metadata schema
• Custom application profile developed based on
  common international schemata
  –   Qualified Dublin Core
  –   LOM
  –   PREMIS
  –   CERIF
• Vocabularies used
  – Eurovoc for thematic indexing – due to inter-
    disciplinarity and multi-linguality
  – Eurydice vocabulary for education levels

          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Metadata schema
• Custom application profile developed based on
  common international schemata
  –   Qualified Dublin Core
  –   LOM
  –   PREMIS
  –   CERIF
• Vocabularies used
  – Eurovoc for thematic indexing – due to inter-
    disciplinarity and multi-linguality
  – Eurydice vocabulary for education levels

          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Processing workflow

     Create metadata record.                                       Handle assigned to record.
    Upload initial digital files to                                Metadata record published.
            repository.                                              Digital files not visible.



                      Digital file processing
                           instructions



       Digital file processing
     (format transformations,                                   Publication of processed digital files
merging/splitting, OCR, file rename)




           7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
DSpace extensions
• Submission form enhancements
• User-friendly browsing (tagcloud, image-based
  type browsing)
• Embedding of video files
• Online reading / streaming of text material




        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Submission form enhancements
• Simplify procedure, fewer steps
• Enter metadata and upload multiple files from
  single form
• Subsequently edit metadata from submission
  form
• Multi-lingual controlled vocabularies with user-
  friendly values
• Various new widgets (e.g. language selection
  for fields, name processing)
        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
User-friendly browsing
• Tag cloud for subjects
• Configurable from dspace.dfg – can be used for
  any other field indexed for browsing
• Image-based browsing for content types




        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Configurable tag cloud (examples)
# Select the browse index to create a tag cloud for in the home page
# Possible values: any of the browse indices
webui.tagcloud.home.bindex = subject
#
# Select the total tags to show
# Possible values: any integer from 1 to infinity
webui.tagcloud.home.maxtags = 50
#
# Should display the score next to each tag?
# Possible values: true | false
webui.tagcloud.home.showscore = false
#
# The score that tags with lower than that will not appear in the rag
cloud
# Possible values: any integer from 1 to infinity
webui.tagcloud.home.cutlevel = 5


          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Embedding video files
• H.264 enconding using a F4V video file
  container
• Video stream provided by Woowza
• Flowplayer selected from embedding video on
  metadata record page on repository
• Video-related fields included to the metadata
  submission/edit form


        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Online reader for text
• No need to download big files
• Bookmarking/sharing at the page level
• User access statistics at the page level
• Compatible with multitouch interfaces / tablets
• Based on the Internet Archive Book Reader
• Pages are JPEG2000 files served by an image server
• Sophisticated infrastructure for back-end
  processing

         7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Online reader backend
• Two-tier backend:
   – content delivery and
   – content transformation
• Content delivery:
   – Nginx/Varnish load balancer/caching to 2 node tomcat/djatoka cluster
     serving JPEG2000 over a shared SAN storage and utilising a 3 node
     Postgres 9 cluster
• Content transformation
   – “Shadow” backend with specialized multithreaded content types
     transformation server
       • After transformation content copied to the delivery backend
   – Interfaced to OJS/Dspace
   – Manages the automatic batch conversion of content
   – Available as OSS (jp2k-distiller) http://code.google.com/p/jp2k-
     distiller/
Jp2k-distiller & dpool – features
• Available as OSS (http://code.google.com/p/dpool/)
• Set of python scripts that schedules and manages batch conversion of
  content files to different formats, exploiting multithreaded batch
  conversion features.
    – Need to run multiple encoders/per server in order to achieve high sustainable
      conversion ratesortunity window lasts until the batch ends.
• Three (3) different conversion options over various Queues
    – PDF/TIFF → PNG Conversion Queue
    – PNG → JP2000 Conversion Queue
    – JP2000 → Publishing Queue
• Interfaced with OJS and DSpace over a HTTP interface
• New version: dpool (so radically improved we had to change names! )
    – Tool for modular batch handling of any type of content conversion
    – Truly multinode: a central dispatcher server and an artibitaty amount of
      distributed “slave” workers over multiple physical or virtual servers
    – Technologies: Python, ZeroMQ – for master-slave node communication,
      GlusterFS as common filestore, MongoDB for statistics, NodeJS – for web
      based dashboard
    – You can start with one conversion machine and scale to hundreds
Future work
• Support self-archiving (two phase cataloguing)
• Include CRIS functionality – through a separate
  CERIF based system
• Various new features for presentation of
  content




        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Thank you for your attention!
• More info:

                                iostath@ekt.gr
                               nhoussos AT ekt.gr
                                 pstath@ekt.gr




       7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Más contenido relacionado

Destacado

A Year In A Glance 07-08
A Year In A Glance 07-08A Year In A Glance 07-08
A Year In A Glance 07-08sulemanhadi
 
Using CERIF-based CRIS to support the academic and research community: emergi...
Using CERIF-based CRIS to support the academic and research community: emergi...Using CERIF-based CRIS to support the academic and research community: emergi...
Using CERIF-based CRIS to support the academic and research community: emergi...Nikos Houssos
 
Moodle 1.8.3
Moodle 1.8.3Moodle 1.8.3
Moodle 1.8.3ryger
 
Maravilla 10 19 07
Maravilla 10 19 07Maravilla 10 19 07
Maravilla 10 19 07guest4c225b
 
Reputation
ReputationReputation
Reputationpvijay
 
Non-sticky
Non-stickyNon-sticky
Non-stickypvijay
 
Vacation Station
Vacation StationVacation Station
Vacation Stationpookeylou
 
Self-renewal
Self-renewalSelf-renewal
Self-renewalpvijay
 
nhelzki DIRECTING NCM 105
nhelzki DIRECTING NCM 105nhelzki DIRECTING NCM 105
nhelzki DIRECTING NCM 105xtrm nurse
 
Nursing Management for Diabetes Mellitus
Nursing Management for Diabetes MellitusNursing Management for Diabetes Mellitus
Nursing Management for Diabetes Mellitusxtrm nurse
 

Destacado (14)

Ideas
IdeasIdeas
Ideas
 
Earth Superb
Earth SuperbEarth Superb
Earth Superb
 
A Year In A Glance 07-08
A Year In A Glance 07-08A Year In A Glance 07-08
A Year In A Glance 07-08
 
Using CERIF-based CRIS to support the academic and research community: emergi...
Using CERIF-based CRIS to support the academic and research community: emergi...Using CERIF-based CRIS to support the academic and research community: emergi...
Using CERIF-based CRIS to support the academic and research community: emergi...
 
Moodle 1.8.3
Moodle 1.8.3Moodle 1.8.3
Moodle 1.8.3
 
Maravilla 10 19 07
Maravilla 10 19 07Maravilla 10 19 07
Maravilla 10 19 07
 
Reputation
ReputationReputation
Reputation
 
Spinal Injury
Spinal InjurySpinal Injury
Spinal Injury
 
Non-sticky
Non-stickyNon-sticky
Non-sticky
 
Vacation Station
Vacation StationVacation Station
Vacation Station
 
Chapt27
Chapt27Chapt27
Chapt27
 
Self-renewal
Self-renewalSelf-renewal
Self-renewal
 
nhelzki DIRECTING NCM 105
nhelzki DIRECTING NCM 105nhelzki DIRECTING NCM 105
nhelzki DIRECTING NCM 105
 
Nursing Management for Diabetes Mellitus
Nursing Management for Diabetes MellitusNursing Management for Diabetes Mellitus
Nursing Management for Diabetes Mellitus
 

Similar a Implementing a funder repository with heterogeneous material

The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...EDINA, University of Edinburgh
 
The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
 
Why, why, why DELILA? A project to promote the open sharing of our informatio...
Why, why, why DELILA? A project to promote the open sharing of our informatio...Why, why, why DELILA? A project to promote the open sharing of our informatio...
Why, why, why DELILA? A project to promote the open sharing of our informatio...Centre for Distance Education
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataPascal-Nicolas Becker
 
DELILA project overview
DELILA project overviewDELILA project overview
DELILA project overviewJane Secker
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitationEOSC-hub project
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebPascal-Nicolas Becker
 
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...IL Group (CILIP Information Literacy Group)
 
DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014Repository Fringe
 
OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...
OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...
OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...OpenAIRE
 
Open Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpen Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpenAIRE
 
Open Accessibility EverywhereGroundwork, Infrastructure, Standards
Open Accessibility EverywhereGroundwork, Infrastructure, StandardsOpen Accessibility EverywhereGroundwork, Infrastructure, Standards
Open Accessibility EverywhereGroundwork, Infrastructure, StandardsAEGIS-ACCESSIBLE Projects
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Õpiobjektid ja repositooriumid
Õpiobjektid ja repositooriumidÕpiobjektid ja repositooriumid
Õpiobjektid ja repositooriumidHans Põldoja
 

Similar a Implementing a funder repository with heterogeneous material (20)

OR2012 Biblio-transformation-engine
OR2012 Biblio-transformation-engineOR2012 Biblio-transformation-engine
OR2012 Biblio-transformation-engine
 
The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...
 
The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
Why, why, why DELILA? A project to promote the open sharing of our informatio...
Why, why, why DELILA? A project to promote the open sharing of our informatio...Why, why, why DELILA? A project to promote the open sharing of our informatio...
Why, why, why DELILA? A project to promote the open sharing of our informatio...
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked Data
 
DELILA project overview
DELILA project overviewDELILA project overview
DELILA project overview
 
Edinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for DataEdinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for Data
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitation
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
 
DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014
 
RDM Programme @ Edinburgh - Service Interoperation
RDM Programme @ Edinburgh - Service InteroperationRDM Programme @ Edinburgh - Service Interoperation
RDM Programme @ Edinburgh - Service Interoperation
 
OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...
OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...
OpenAIRE guidelines for CRIS managers : supporting interoperability of open r...
 
Open Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpen Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub services
 
Open Accessibility EverywhereGroundwork, Infrastructure, Standards
Open Accessibility EverywhereGroundwork, Infrastructure, StandardsOpen Accessibility EverywhereGroundwork, Infrastructure, Standards
Open Accessibility EverywhereGroundwork, Infrastructure, Standards
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Õpiobjektid ja repositooriumid
Õpiobjektid ja repositooriumidÕpiobjektid ja repositooriumid
Õpiobjektid ja repositooriumid
 
Ppls mvm2
Ppls mvm2Ppls mvm2
Ppls mvm2
 

Último

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Último (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Implementing a funder repository with heterogeneous material

  • 1. Implementing a funder repository with heterogeneous material and advanced presentation capabilities Ioanna-Ourania Stathopoulou, Nikos Houssos, Panagiotis Stathopoulos, Despina Hardouveli, Alexandra Roubani, Alexandros Soumplis, Chrysostomos Nanakos National Documentation Centre / National Hellenic Research Foundation, Greece 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 2. Agenda • Aims and scope of the project • Methodology and approach • Implementation / DSpace extensions • Conclusions – future work 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 3. Project context • Hellenic Ministry of Education programmes for education and lifelong learning, co-funded by the EU structural funds – Three phases: 1994-1999, 2000-2006, 2007-2013 – Overall funds in the range of billion Euros • Activities funded: – Face-to-face courses / seminars – Creation of educational material – Studies, surveys, reports – Conferences, workshops – Research: theses, collaborative projects 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 4. Aims and scope of the project • Decision by funding authority to create a repository (autumn 2010) • Aims: Dissemination, reuse and preservation of the digital material produced in the frame of the funded programmes • Single point of (persistent) access to material, standards-based documentation/metadata, interoperability, link of content to project / funding programme • Repository available (since spring 2011) – http://repository.edulll.gr 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 5. Project inputs and outputs • Input material: – Digital files arranged in folders – (hopefully) according to project – List of projects – No metadata! • Output aimed for: – Repository with standards compliant metadata records along with processed digital files, suitable for dissemination – User-friendly interface for human users – Programmatic interface (OAI-PMH) for reuse by third- party software applications 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 6. Methodology and approach • Investigate input material – Identify content of archival value – exclude all other material (e.g. administrative documents) – Assign priorities to content • Documentation / cataloguing of material according to standards – application profile created – Subject cataloguing – No self-archiving at this phase • Digital processing of input files to produce copies suitable for preservation and dissemination – both initial and processed files kept in repository • A lot of effort allocated to user-friendly interfaces for browsing metadata and accessing content 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 7. Metadata schema • Custom application profile developed based on common international schemata – Qualified Dublin Core – LOM – PREMIS – CERIF • Vocabularies used – Eurovoc for thematic indexing – due to inter- disciplinarity and multi-linguality – Eurydice vocabulary for education levels 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 8. Metadata schema • Custom application profile developed based on common international schemata – Qualified Dublin Core – LOM – PREMIS – CERIF • Vocabularies used – Eurovoc for thematic indexing – due to inter- disciplinarity and multi-linguality – Eurydice vocabulary for education levels 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 9. Processing workflow Create metadata record. Handle assigned to record. Upload initial digital files to Metadata record published. repository. Digital files not visible. Digital file processing instructions Digital file processing (format transformations, Publication of processed digital files merging/splitting, OCR, file rename) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 10. DSpace extensions • Submission form enhancements • User-friendly browsing (tagcloud, image-based type browsing) • Embedding of video files • Online reading / streaming of text material 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 11. Submission form enhancements • Simplify procedure, fewer steps • Enter metadata and upload multiple files from single form • Subsequently edit metadata from submission form • Multi-lingual controlled vocabularies with user- friendly values • Various new widgets (e.g. language selection for fields, name processing) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 12. 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 13. 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 14. User-friendly browsing • Tag cloud for subjects • Configurable from dspace.dfg – can be used for any other field indexed for browsing • Image-based browsing for content types 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 15. Configurable tag cloud (examples) # Select the browse index to create a tag cloud for in the home page # Possible values: any of the browse indices webui.tagcloud.home.bindex = subject # # Select the total tags to show # Possible values: any integer from 1 to infinity webui.tagcloud.home.maxtags = 50 # # Should display the score next to each tag? # Possible values: true | false webui.tagcloud.home.showscore = false # # The score that tags with lower than that will not appear in the rag cloud # Possible values: any integer from 1 to infinity webui.tagcloud.home.cutlevel = 5 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 16. Embedding video files • H.264 enconding using a F4V video file container • Video stream provided by Woowza • Flowplayer selected from embedding video on metadata record page on repository • Video-related fields included to the metadata submission/edit form 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 17. Online reader for text • No need to download big files • Bookmarking/sharing at the page level • User access statistics at the page level • Compatible with multitouch interfaces / tablets • Based on the Internet Archive Book Reader • Pages are JPEG2000 files served by an image server • Sophisticated infrastructure for back-end processing 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 18. Online reader backend • Two-tier backend: – content delivery and – content transformation • Content delivery: – Nginx/Varnish load balancer/caching to 2 node tomcat/djatoka cluster serving JPEG2000 over a shared SAN storage and utilising a 3 node Postgres 9 cluster • Content transformation – “Shadow” backend with specialized multithreaded content types transformation server • After transformation content copied to the delivery backend – Interfaced to OJS/Dspace – Manages the automatic batch conversion of content – Available as OSS (jp2k-distiller) http://code.google.com/p/jp2k- distiller/
  • 19.
  • 20. Jp2k-distiller & dpool – features • Available as OSS (http://code.google.com/p/dpool/) • Set of python scripts that schedules and manages batch conversion of content files to different formats, exploiting multithreaded batch conversion features. – Need to run multiple encoders/per server in order to achieve high sustainable conversion ratesortunity window lasts until the batch ends. • Three (3) different conversion options over various Queues – PDF/TIFF → PNG Conversion Queue – PNG → JP2000 Conversion Queue – JP2000 → Publishing Queue • Interfaced with OJS and DSpace over a HTTP interface • New version: dpool (so radically improved we had to change names! ) – Tool for modular batch handling of any type of content conversion – Truly multinode: a central dispatcher server and an artibitaty amount of distributed “slave” workers over multiple physical or virtual servers – Technologies: Python, ZeroMQ – for master-slave node communication, GlusterFS as common filestore, MongoDB for statistics, NodeJS – for web based dashboard – You can start with one conversion machine and scale to hundreds
  • 21. Future work • Support self-archiving (two phase cataloguing) • Include CRIS functionality – through a separate CERIF based system • Various new features for presentation of content 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 22. Thank you for your attention! • More info: iostath@ekt.gr nhoussos AT ekt.gr pstath@ekt.gr 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012