SlideShare una empresa de Scribd logo
1 de 25
Extraction and Visualization of Metadata
 Analytics for Multimedia Learning Object
Repositories: The case of TERENA TF-media
                 network

          Kostas Vogias1, Ilias Hatzakis1, Nikos
              Manouselis2, Peter Szegedi3

     1
         Greek Research and Technology Network
               2
                 Agro-Know Technologies
                   3
                     TERENA TF-media



     Workshop on Learning Object Analytics for Collections,
         Repositories and Federations, 9 April, 2013
Metadata analysis is not something new

•   Ochoa, Xavier, Klerkx, Joris, Vandeputte, Bram, and Duval, Erik.
     –   On the Use of Learning Object Metadata: The GLOBE Experience.
          • Made the first fully quantitative study in a large number of Learning
            Repositories that belongs to a large organization like Globe


•   Neven, Filip and Duval, Erik.
     – Reusable Learning Objects: a Survey of LOM-Based Repositories.


•   Zschocke, Thomas and Beniest, Jan and Paisley, Courtney and
    Najjar, Jehad and Duval, Erik.
     – The LOM application profile for agricultural learning resources of the CGIAR
          • studied the use of LOM for the indexing of learning resources


•   Manouselis, N, Salokhe, G, Keizer, J, and Rudgard, S.
     – Towards a Harmonization of Metadata Application Profiles for Agricultural
       Learning Repositories.
          • Made an analysis of the metadata schemas used by Repositories including
            Agricultural Learning Resources.
Why extraction of metadata analytics is
important when we are developing a learning
                  portal

 Which metadata schema is used by our content providers?


     How the providers are using different elements of the schema?


 On which metadata schema our learning portal should rely?

    Can we provide services based on metadata elements such as
            Portal design decisions
            Portal design decisions
    Subject, Type, Format, Keywords, Title and Descriptions?


 Which languages can our portal support?
Study Objectives

• To perform a quantified study on the different metadata
  schemas used by TERENA TF media network

• To propose a metadata schema on which the TERENA OER
  portal will be based

• To verify if metadata analytics can constitute a tool that
  can facilitate the development of a learning portal that is
  based on metadata records aggregated by various content
  providers
What Terena OER Project is


• A European level metadata aggregation portal for Open
  Educational Resources (primarily audiovisual contents,
  recorded lectures) collected and maintained by institutional
  and national content repositories of the Research &
  Education Community.
• Main objectives of the project
   – Create a broker for national learning resource organizations.
   – Bridge the gap between the national repositories and the
     emerging global repositories (e.g., GLOBE) by establishing a
     European level metadata repository (i.e. aggregation point)
     for the national repositories acting in the R&E community. The
     European level repository will be a metadata repository only,
     the content remains in its original content repository.
Which Data Providers


• Successfully harvested.
Repository Name                     Records Harvested   Metadata Schema
Switch Collection                   619                 oai_dc

DSpace at University Of Latvia      1009                oai_dc

OBAA Repository                     56                  oai_dc

RiuNet: Repositori Institucional    21902               oai_dc
de la Universitat Politècnica de   58.000 + instances
València
SCAM Repository                     7351                oai_lom

Material Audiovisual ofrecido por   978                 oai_dc
el Campus do Mar

wikiwijs                            26054               oai_dc

Małopolskie Towarzystwo             181                 oai_dc
Genealogiczne



Select the ones that can be used for the first version of TERENA OER
portal
how and what we used
HOW
• Repository Based Analysis.
• Metrics
   – Element Completeness: The percentage of records in which
     an element has a value.
   – Relative Entropy: Diversity of values in an element.
   – Vocabulary values distribution: Format, Language and
     Type
   – Language properties: Attribute (e.g. lang=en) value usage
     frequency e.g. lang in free text metadata elements Title,
     Description and Keyword
• The analysis was performed for a core set of metadata
  elements that is present in the studied repositories
• Use of a standard set of mappings from DC to LOM
What we have used

• ARIADNE Harvester

• A metadata analysis tool
   – Implemented using JAVA.
   – Metadata schema agnostic.
   – Metadata Analysis schemes:
      1. Repository based.
      2. Federation based.
   – Element based analysis:
      •   Completeness
      •   Relative Entropy
      •   Specific element vocabulary extraction and usage frequency
   – Attribute based analysis.
      •   Lang value attribute frequency
   – Input:XMLs
   – Output:CSV,TXT
Current version of the tool
Next version of the tool
RESULTS




results
elements usage
Element completeness
                                                           Campus do
Element       Switch   UOL      OBAA     RiuNet   SCAM
                                                              Mar
                                                                     wikiwijs   Malopolskie

   Title       52.62   100.00   100.00   100.00   99.96      100.00    100.00     100.00
Description    47.67    0.10     0.00     0.17    99.96      100.00     0.00       99.44
  Subject      52.45   100.00    0.00    96.84    10.49      100.00     0.00      100.00
  Format       0.00     0.00     0.00     0.00    73.31      100.00     0.00       98.89
 Identifier    52.62   100.00   100.00   100.00   100.00     97.24      0.00      100.00
Language       3.10    95.34     0.00    99.91    92.45      100.00    100.00     100.00
   Type        52.33   99.31     0.00    96.57    74.12      100.00    100.00     100.00
 Publisher     47.38   44.84     0.00    43.97    65.43      100.00    100.00      97.78
 Creator       52.45   92.76    89.09    99.67    65.43      100.00     0.00       32.78
  Rights       52.62    0.20     0.00    100.00   89.27      100.00    96.07       0.00
   Date        0.00    31.15    100.00    6.38    45.89      100.00     0.00       98.89
 Relation      0.00    15.87     0.00     3.83     0.00       0.00      0.00       47.22
 Coverage      0.00    67.66     0.00    97.94     0.00       0.00      0.00      100.00
  Source       51.85    0.00     0.00    100.00   100.00      0.00      0.00       99.44
Average Element Usage


               Define the mandatory elements
                for the schema of metadata
                         aggregator
which values are used
Heterogeneity of information
What type of digital objects


Of interest for TERENA OER

                             We need a filtering mechanism
                              to keep only the LO that are
                                  suitable for TERENA
which languages are used for the
annotation
Language distribution for title
Language distribution for description
Language distribution for keyword
English can be the main language
supported on the TERENA OER Portal
Conclusions
• Metadata Analysis helps in:
   – Defining metadata aggregation element set.
   – Defining the type of services that can be provided by the TERENA OER
     portal e.g. browse by type of LOs, elements that can be used for full
     text search
   – Providing recommendations back to the providers about usage of
     metadata elements
   – Validating the metadata records at harvesting time


• Next steps
   – Extend to more repositories of TERENA TF-media network
   – Combine with results of an online survey for content providers
   – Develop the web based version of the tool and provide it as an open
     source tool
Thank you!
  Ilias Hatzakis,GRNET
hatzakis@admin.grnet.gr

Más contenido relacionado

Destacado

What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-endgagravarr
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaChris Mattmann
 
Metadata Strategies And Tools
Metadata Strategies And ToolsMetadata Strategies And Tools
Metadata Strategies And ToolsRachel Lovinger
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content TransformationAlfresco Software
 
Mapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual TaxonomiesMapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual TaxonomiesHeather Hedden
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Usedmurph4
 
A Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsA Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsRachel Lovinger
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache TikaPaolo Mottadelli
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 

Destacado (12)

What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-end
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache Tika
 
Metadata Strategies And Tools
Metadata Strategies And ToolsMetadata Strategies And Tools
Metadata Strategies And Tools
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content Transformation
 
Mapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual TaxonomiesMapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual Taxonomies
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
Taxonomy Interoperability Standards
Taxonomy Interoperability StandardsTaxonomy Interoperability Standards
Taxonomy Interoperability Standards
 
A Survey: Taxonomy Building Tools
A Survey: Taxonomy Building ToolsA Survey: Taxonomy Building Tools
A Survey: Taxonomy Building Tools
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache Tika
 
Metadata in Business Intelligence
Metadata in Business IntelligenceMetadata in Business Intelligence
Metadata in Business Intelligence
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 

Similar a TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013

The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
GLOBE Metadata Analysis
GLOBE Metadata AnalysisGLOBE Metadata Analysis
GLOBE Metadata AnalysisXavier Ochoa
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
Learnometrics: Metrics for Learning Objects
Learnometrics: Metrics for Learning ObjectsLearnometrics: Metrics for Learning Objects
Learnometrics: Metrics for Learning ObjectsXavier Ochoa
 
Gaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyGaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyOntotext
 
Open Educational Resources - OER
Open Educational Resources - OEROpen Educational Resources - OER
Open Educational Resources - OERJad Najjar
 
DataONE_Guided Metadata Improvement
DataONE_Guided Metadata ImprovementDataONE_Guided Metadata Improvement
DataONE_Guided Metadata ImprovementLindsay Powers
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managersNick Sheppard
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library servicesNikesh Narayanan
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Ken Karapetyan
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...The Open Education Consortium
 
Preliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning RepositoriesPreliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning RepositoriesNikos Palavitsinis, PhD
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providerslocloud
 
Building an ecosystem of networked references
Building an ecosystem of networked referencesBuilding an ecosystem of networked references
Building an ecosystem of networked referencesHugo Manguinhas
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 

Similar a TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013 (20)

Presentation FAIRsFAIR workshop (April 2020)
Presentation FAIRsFAIR workshop (April 2020)Presentation FAIRsFAIR workshop (April 2020)
Presentation FAIRsFAIR workshop (April 2020)
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
GLOBE Metadata Analysis
GLOBE Metadata AnalysisGLOBE Metadata Analysis
GLOBE Metadata Analysis
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Learnometrics: Metrics for Learning Objects
Learnometrics: Metrics for Learning ObjectsLearnometrics: Metrics for Learning Objects
Learnometrics: Metrics for Learning Objects
 
Gaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyGaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive Technology
 
Open Educational Resources - OER
Open Educational Resources - OEROpen Educational Resources - OER
Open Educational Resources - OER
 
DataONE_Guided Metadata Improvement
DataONE_Guided Metadata ImprovementDataONE_Guided Metadata Improvement
DataONE_Guided Metadata Improvement
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managers
 
Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library services
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...An Architecture based on Linked Data technologies for the Integration of OER ...
An Architecture based on Linked Data technologies for the Integration of OER ...
 
Preliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning RepositoriesPreliminary Discussion on a Digital Curation Framework for Learning Repositories
Preliminary Discussion on a Digital Curation Framework for Learning Repositories
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providers
 
Building an ecosystem of networked references
Building an ecosystem of networked referencesBuilding an ecosystem of networked references
Building an ecosystem of networked references
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 

Último

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Último (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013

  • 1. Extraction and Visualization of Metadata Analytics for Multimedia Learning Object Repositories: The case of TERENA TF-media network Kostas Vogias1, Ilias Hatzakis1, Nikos Manouselis2, Peter Szegedi3 1 Greek Research and Technology Network 2 Agro-Know Technologies 3 TERENA TF-media Workshop on Learning Object Analytics for Collections, Repositories and Federations, 9 April, 2013
  • 2. Metadata analysis is not something new • Ochoa, Xavier, Klerkx, Joris, Vandeputte, Bram, and Duval, Erik. – On the Use of Learning Object Metadata: The GLOBE Experience. • Made the first fully quantitative study in a large number of Learning Repositories that belongs to a large organization like Globe • Neven, Filip and Duval, Erik. – Reusable Learning Objects: a Survey of LOM-Based Repositories. • Zschocke, Thomas and Beniest, Jan and Paisley, Courtney and Najjar, Jehad and Duval, Erik. – The LOM application profile for agricultural learning resources of the CGIAR • studied the use of LOM for the indexing of learning resources • Manouselis, N, Salokhe, G, Keizer, J, and Rudgard, S. – Towards a Harmonization of Metadata Application Profiles for Agricultural Learning Repositories. • Made an analysis of the metadata schemas used by Repositories including Agricultural Learning Resources.
  • 3. Why extraction of metadata analytics is important when we are developing a learning portal Which metadata schema is used by our content providers? How the providers are using different elements of the schema? On which metadata schema our learning portal should rely? Can we provide services based on metadata elements such as Portal design decisions Portal design decisions Subject, Type, Format, Keywords, Title and Descriptions? Which languages can our portal support?
  • 4. Study Objectives • To perform a quantified study on the different metadata schemas used by TERENA TF media network • To propose a metadata schema on which the TERENA OER portal will be based • To verify if metadata analytics can constitute a tool that can facilitate the development of a learning portal that is based on metadata records aggregated by various content providers
  • 5. What Terena OER Project is • A European level metadata aggregation portal for Open Educational Resources (primarily audiovisual contents, recorded lectures) collected and maintained by institutional and national content repositories of the Research & Education Community. • Main objectives of the project – Create a broker for national learning resource organizations. – Bridge the gap between the national repositories and the emerging global repositories (e.g., GLOBE) by establishing a European level metadata repository (i.e. aggregation point) for the national repositories acting in the R&E community. The European level repository will be a metadata repository only, the content remains in its original content repository.
  • 6. Which Data Providers • Successfully harvested. Repository Name Records Harvested Metadata Schema Switch Collection 619 oai_dc DSpace at University Of Latvia 1009 oai_dc OBAA Repository 56 oai_dc RiuNet: Repositori Institucional 21902 oai_dc de la Universitat Politècnica de 58.000 + instances València SCAM Repository 7351 oai_lom Material Audiovisual ofrecido por 978 oai_dc el Campus do Mar wikiwijs 26054 oai_dc Małopolskie Towarzystwo 181 oai_dc Genealogiczne Select the ones that can be used for the first version of TERENA OER portal
  • 7. how and what we used
  • 8. HOW • Repository Based Analysis. • Metrics – Element Completeness: The percentage of records in which an element has a value. – Relative Entropy: Diversity of values in an element. – Vocabulary values distribution: Format, Language and Type – Language properties: Attribute (e.g. lang=en) value usage frequency e.g. lang in free text metadata elements Title, Description and Keyword • The analysis was performed for a core set of metadata elements that is present in the studied repositories • Use of a standard set of mappings from DC to LOM
  • 9. What we have used • ARIADNE Harvester • A metadata analysis tool – Implemented using JAVA. – Metadata schema agnostic. – Metadata Analysis schemes: 1. Repository based. 2. Federation based. – Element based analysis: • Completeness • Relative Entropy • Specific element vocabulary extraction and usage frequency – Attribute based analysis. • Lang value attribute frequency – Input:XMLs – Output:CSV,TXT
  • 10. Current version of the tool
  • 11. Next version of the tool
  • 14. Element completeness Campus do Element Switch UOL OBAA RiuNet SCAM Mar wikiwijs Malopolskie Title 52.62 100.00 100.00 100.00 99.96 100.00 100.00 100.00 Description 47.67 0.10 0.00 0.17 99.96 100.00 0.00 99.44 Subject 52.45 100.00 0.00 96.84 10.49 100.00 0.00 100.00 Format 0.00 0.00 0.00 0.00 73.31 100.00 0.00 98.89 Identifier 52.62 100.00 100.00 100.00 100.00 97.24 0.00 100.00 Language 3.10 95.34 0.00 99.91 92.45 100.00 100.00 100.00 Type 52.33 99.31 0.00 96.57 74.12 100.00 100.00 100.00 Publisher 47.38 44.84 0.00 43.97 65.43 100.00 100.00 97.78 Creator 52.45 92.76 89.09 99.67 65.43 100.00 0.00 32.78 Rights 52.62 0.20 0.00 100.00 89.27 100.00 96.07 0.00 Date 0.00 31.15 100.00 6.38 45.89 100.00 0.00 98.89 Relation 0.00 15.87 0.00 3.83 0.00 0.00 0.00 47.22 Coverage 0.00 67.66 0.00 97.94 0.00 0.00 0.00 100.00 Source 51.85 0.00 0.00 100.00 100.00 0.00 0.00 99.44
  • 15. Average Element Usage Define the mandatory elements for the schema of metadata aggregator
  • 18. What type of digital objects Of interest for TERENA OER We need a filtering mechanism to keep only the LO that are suitable for TERENA
  • 19. which languages are used for the annotation
  • 23. English can be the main language supported on the TERENA OER Portal
  • 24. Conclusions • Metadata Analysis helps in: – Defining metadata aggregation element set. – Defining the type of services that can be provided by the TERENA OER portal e.g. browse by type of LOs, elements that can be used for full text search – Providing recommendations back to the providers about usage of metadata elements – Validating the metadata records at harvesting time • Next steps – Extend to more repositories of TERENA TF-media network – Combine with results of an online survey for content providers – Develop the web based version of the tool and provide it as an open source tool
  • 25. Thank you! Ilias Hatzakis,GRNET hatzakis@admin.grnet.gr

Notas del editor

  1. This presentation is about extracting metadata analytics for Multimedia Learning Object Repositories. It was conducted by memebers of GRNET in collaboration with Nikos Manouselis from Agro-Know Technologies and Peter Szegedi from Terena TF Media.
  2. There are many previous studies that have worked on the analysis of the metadata used content providers. So our work is based on a concept and methods that have been previous defined. The most relevant work to our study is the first fully quantified study in a large number of Learning Repositories that belongs to Globe.
  3. Let’s start by pointing why a study on metadata analytics is important when we are developing a learning portal that is based on metadata aggregation from a number of providers. When we are developing such a learning portal questions like these are born. Most of these questions are about portal design decisions and thus very critical for the development of the portal.
  4. The main objectives of our study are:
  5. Let’s first say some words about TERENA OER Project. The goal of this project is to set up a European level metadata aggregation portal for ….. It aims at creating a broken ….. and a bridge between ….
  6. Here you can see the list of the data providers that were successfully harvested. The vast majority of the providers are using Dublin Core as a format for exposing the metadata. The goal is to identify these providers that can be used for the first version of the TERENA OER portal.
  7. We have 58K+ instances and we need to select the data providers that can be used for the first version of the TERENA OER portal
  8. How we did we conduct the study. We performed a repository based and we studied:
  9. What we have used to conduct this study is the Ariadne Harvester in order to aggregate the metadata records and fro the metadata analysis we have developed a tool that will ….
  10. We have this version now bu
  11. we are working on such a version which will include GUI that will allow to researchers to easily estimate metadata analytics and visualize them.
  12. A heat map table for the element usage. One can see (++CLICK++) that elements such as title, Identifier, Language, Type and Creator are generally used by the studied repositories. The repository with the most rich metadata is the repository of campus do Mar, SCAM, Malopolskie and RiuNet
  13. In this chart the average element usage values for all the studied repositories is presented. It is evident again elements like Title, identifier, language, type and publisher are highly used almost by all repositories. These elements can be candidates as mandatory/core elements of the aggregator metadata schema.
  14. Zero entropy means that the values are null or we have the same value for all elements. Format element is used only by one repository and thus the entropy is zero for the rest. High values of entropy indicates heterogeneity/diversity of info. Type of learning object has high values in many repositories and this means that we have various type of objects. Further, this means that if TERENA OER will include only audiovisual material some kind of filtering will be needed. For Campus do Mar all the entropy for all the values were found zero and this is due to the fact that all the records has the same values for these elements.
  15. The main conclusion from language analysis of free text elements is that if all the repositories will be aggregated then English can be the main language supported on TERENA OER Portal.