A presentation on "Extraction and Visualization of Metadata Analytics for Multimedia Learning Object Repositories: The case of TERENA TF-media network OER portal" presented at the LACRO workshop of the LAK Conference, on April 9th, 2013
TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013
1. Extraction and Visualization of Metadata
Analytics for Multimedia Learning Object
Repositories: The case of TERENA TF-media
network
Kostas Vogias1, Ilias Hatzakis1, Nikos
Manouselis2, Peter Szegedi3
1
Greek Research and Technology Network
2
Agro-Know Technologies
3
TERENA TF-media
Workshop on Learning Object Analytics for Collections,
Repositories and Federations, 9 April, 2013
2. Metadata analysis is not something new
• Ochoa, Xavier, Klerkx, Joris, Vandeputte, Bram, and Duval, Erik.
– On the Use of Learning Object Metadata: The GLOBE Experience.
• Made the first fully quantitative study in a large number of Learning
Repositories that belongs to a large organization like Globe
• Neven, Filip and Duval, Erik.
– Reusable Learning Objects: a Survey of LOM-Based Repositories.
• Zschocke, Thomas and Beniest, Jan and Paisley, Courtney and
Najjar, Jehad and Duval, Erik.
– The LOM application profile for agricultural learning resources of the CGIAR
• studied the use of LOM for the indexing of learning resources
• Manouselis, N, Salokhe, G, Keizer, J, and Rudgard, S.
– Towards a Harmonization of Metadata Application Profiles for Agricultural
Learning Repositories.
• Made an analysis of the metadata schemas used by Repositories including
Agricultural Learning Resources.
3. Why extraction of metadata analytics is
important when we are developing a learning
portal
Which metadata schema is used by our content providers?
How the providers are using different elements of the schema?
On which metadata schema our learning portal should rely?
Can we provide services based on metadata elements such as
Portal design decisions
Portal design decisions
Subject, Type, Format, Keywords, Title and Descriptions?
Which languages can our portal support?
4. Study Objectives
• To perform a quantified study on the different metadata
schemas used by TERENA TF media network
• To propose a metadata schema on which the TERENA OER
portal will be based
• To verify if metadata analytics can constitute a tool that
can facilitate the development of a learning portal that is
based on metadata records aggregated by various content
providers
5. What Terena OER Project is
• A European level metadata aggregation portal for Open
Educational Resources (primarily audiovisual contents,
recorded lectures) collected and maintained by institutional
and national content repositories of the Research &
Education Community.
• Main objectives of the project
– Create a broker for national learning resource organizations.
– Bridge the gap between the national repositories and the
emerging global repositories (e.g., GLOBE) by establishing a
European level metadata repository (i.e. aggregation point)
for the national repositories acting in the R&E community. The
European level repository will be a metadata repository only,
the content remains in its original content repository.
6. Which Data Providers
• Successfully harvested.
Repository Name Records Harvested Metadata Schema
Switch Collection 619 oai_dc
DSpace at University Of Latvia 1009 oai_dc
OBAA Repository 56 oai_dc
RiuNet: Repositori Institucional 21902 oai_dc
de la Universitat Politècnica de 58.000 + instances
València
SCAM Repository 7351 oai_lom
Material Audiovisual ofrecido por 978 oai_dc
el Campus do Mar
wikiwijs 26054 oai_dc
Małopolskie Towarzystwo 181 oai_dc
Genealogiczne
Select the ones that can be used for the first version of TERENA OER
portal
8. HOW
• Repository Based Analysis.
• Metrics
– Element Completeness: The percentage of records in which
an element has a value.
– Relative Entropy: Diversity of values in an element.
– Vocabulary values distribution: Format, Language and
Type
– Language properties: Attribute (e.g. lang=en) value usage
frequency e.g. lang in free text metadata elements Title,
Description and Keyword
• The analysis was performed for a core set of metadata
elements that is present in the studied repositories
• Use of a standard set of mappings from DC to LOM
9. What we have used
• ARIADNE Harvester
• A metadata analysis tool
– Implemented using JAVA.
– Metadata schema agnostic.
– Metadata Analysis schemes:
1. Repository based.
2. Federation based.
– Element based analysis:
• Completeness
• Relative Entropy
• Specific element vocabulary extraction and usage frequency
– Attribute based analysis.
• Lang value attribute frequency
– Input:XMLs
– Output:CSV,TXT
23. English can be the main language
supported on the TERENA OER Portal
24. Conclusions
• Metadata Analysis helps in:
– Defining metadata aggregation element set.
– Defining the type of services that can be provided by the TERENA OER
portal e.g. browse by type of LOs, elements that can be used for full
text search
– Providing recommendations back to the providers about usage of
metadata elements
– Validating the metadata records at harvesting time
• Next steps
– Extend to more repositories of TERENA TF-media network
– Combine with results of an online survey for content providers
– Develop the web based version of the tool and provide it as an open
source tool
This presentation is about extracting metadata analytics for Multimedia Learning Object Repositories. It was conducted by memebers of GRNET in collaboration with Nikos Manouselis from Agro-Know Technologies and Peter Szegedi from Terena TF Media.
There are many previous studies that have worked on the analysis of the metadata used content providers. So our work is based on a concept and methods that have been previous defined. The most relevant work to our study is the first fully quantified study in a large number of Learning Repositories that belongs to Globe.
Let’s start by pointing why a study on metadata analytics is important when we are developing a learning portal that is based on metadata aggregation from a number of providers. When we are developing such a learning portal questions like these are born. Most of these questions are about portal design decisions and thus very critical for the development of the portal.
The main objectives of our study are:
Let’s first say some words about TERENA OER Project. The goal of this project is to set up a European level metadata aggregation portal for ….. It aims at creating a broken ….. and a bridge between ….
Here you can see the list of the data providers that were successfully harvested. The vast majority of the providers are using Dublin Core as a format for exposing the metadata. The goal is to identify these providers that can be used for the first version of the TERENA OER portal.
We have 58K+ instances and we need to select the data providers that can be used for the first version of the TERENA OER portal
How we did we conduct the study. We performed a repository based and we studied:
What we have used to conduct this study is the Ariadne Harvester in order to aggregate the metadata records and fro the metadata analysis we have developed a tool that will ….
We have this version now bu
we are working on such a version which will include GUI that will allow to researchers to easily estimate metadata analytics and visualize them.
A heat map table for the element usage. One can see (++CLICK++) that elements such as title, Identifier, Language, Type and Creator are generally used by the studied repositories. The repository with the most rich metadata is the repository of campus do Mar, SCAM, Malopolskie and RiuNet
In this chart the average element usage values for all the studied repositories is presented. It is evident again elements like Title, identifier, language, type and publisher are highly used almost by all repositories. These elements can be candidates as mandatory/core elements of the aggregator metadata schema.
Zero entropy means that the values are null or we have the same value for all elements. Format element is used only by one repository and thus the entropy is zero for the rest. High values of entropy indicates heterogeneity/diversity of info. Type of learning object has high values in many repositories and this means that we have various type of objects. Further, this means that if TERENA OER will include only audiovisual material some kind of filtering will be needed. For Campus do Mar all the entropy for all the values were found zero and this is due to the fact that all the records has the same values for these elements.
The main conclusion from language analysis of free text elements is that if all the repositories will be aggregated then English can be the main language supported on TERENA OER Portal.