No hay notas en la diapositiva.
Good afternoon and welcome to this session.
This is a recording presenting the work “Digital Representation of Rights for Language Resources”, by VRD (speaking now) and PL.
This work is the result of a joint effort of UPM (Madrid, Spain) and the ILSP (Athens, Greece), in tight cooperation with the EU funded LIDER Porject and the W3C LD4LT community group, heavily working with resources and licenses from the META Share network.
In this presentation, it will be shown how the most commonly used licenses for language resources can be digitally represented, reusing existing vocabularies and extending the Open Digital Rights Language core model.
The most important information of Licenses will be represented as RDF.
Practical examples and guidelines of use the ‘Rights Information for Language Resources’ vocabulary will be given.
Under the umbrella term “Language Resources”, we find a number of different items like dictionaries, lexicons, thesauri, semantic networks, written and spoken corpora, pos taggers, syntantic analyzers, ontologies, term banks, phonetic databases…
Some of these heterogeneous resources are actually databases, some are pieces of software, some are pure creative works.
word sense disambiguation service
word sense disambiguation service
translation web service
translation web service
The production of these resources in occasions require an important effort which is acknowledged by almost every legal system.
In general LANGUAGE RESOURCES are protected by the law.
Most of the resources qualify to be intellectual property works and as such protected by copyright. Some receive full protection as works (like ontologies, or computer programs), some a reduced protection as databases. Language resources are not, however, pantentable.
In any case, for their safe consumption, rights information must be present. In particular, licenses determine which uses are allowed under which conditions.
Actually, one of the priorities set by the FLARENET Strategic Research Agenda is the availability of LRs “within an adequate IPR and legal framework”. The recommendations include the elaboration of “specific, simple and harmonised licensing solutions for data resources”, taking into account licensing schemes already in use and the adoption of electronic licensing and adaptation of current distribution models to new media
The most important parts of a license can be represented in a digital form as XML or RDF.
This is not a new idea, and was also supported by the Creative Commons licenses, which include a legal code, a summary and a purely electronic version.
There is a number of advantages when digital licensing is used
1 Improvement of the understanding of the licensing terms by human users
Although licenses are texts in natural language, the legal jargon may not be easily understood by newcomers. A harmonised vocabulary for licensing terms favours universal understanding of their precise meaning
2. Digital licensing allows the Processing of the licensing terms by machines
Computer programs can take decisions based on the license, like selectively granting access or permitting or denying the combination of differently licensed resources.
3.Enhancement of the search and discovery of LRs. Query by licensing terms is possible, for example limiting a search to resources where the license allows commercial use, creation of derivative works etc
4. Finally, better management, preservation and interoperation of the LRs by publishers, who have a clearer account on which rights have been granted to which resources
The license is a piece of metadata describing a language resource, like the authorship or the creation date
Language resources are sometimes collected in catalogs, and in this case licensing metadata appears within the resource but also as a metadata record in the catalog. This second case is more important.
Indeed, this duplicity of information may lead to inconsistencies, which would be minor if rights were represented digitally in a uniform or interoperable manner.
These language resource catalogs are actually databases with the metadata of the resources.
All of them handle licensing information, as it is crucial for the safe consumption of resources in industrial settings.
Some data catalogs are the OLAC Language Resource Catalog, META-SHARE, CLARIN, CLARIN Virtual Language Observatory, Datahub.io, LREMap, Linghub
We have studied how each of these respositories handle the licensing information, finding three possible scenarios.
Licensing info as free text
catalogs where the rights information is loosely represented as a free text metadata element: this is mainly the case for portals harvesting from various sources, such as OLAC, the LRE Map and the CLARIN Virtual Lanuage Observatory (VLO)
Licensing info as a choice among several possibilities
META-SHARE and partly Datahub and the CLARIN network repositories
Licensing info as a more complex rights expression
The META-SHARE ontology defined a richer set of combinable elements to build licenses.
The latter is the most complete option. For example, faceted browsing with the criterion of access rights/ license is a feature integrated in most of the catalogs mentioned before. But in the case of META-SHARE, faceted browsing with a filter for con
ditions of use is allowed (e.g. whether the license allows commercial use, derivatives etc.)
We find this META-SHARE repo as the most interesting case from a licensing point of view.
The META-SHARE network includes 13 resource repositories, with over 1200 resource packages.
The META-SHARE (MS) metadata schema constitutes an essential ingredient of the META-SHARE infrastructure.
META-SHARE creates a “space” within which different LRs may be shared under specific licensing terms
The original abstract META-SHARE metadata model was first implemented as an XML Schema.
A resource in META-SHARE IS described with an XML document adhering to that schema. Some elements are obligatory (minimal version), some recommended and some optional (having thus a maximal version).
The model contains 5 types of entities, where the most important one was the resource, specialized by the langresource.
The language resource contained information on the resource, on the version, other metadata and also DISTRIBUTION INFO.
The core of the schema is the resourceInfo component, which includes administrative components relevant to all LRs, like idenification, usage info or media type.
Licenses are root entities, but licensing terms are present as well as features of the distribution info: its availability, detailed conditions or the IPR holder
The META-SHARE Metadata model has been ported to OWL/RDF by Marta Villegas, and has also largely influenced the ontology presented in the session before.
The work presented today proposes a model specifically addressing the licensing elements in a more standardized manner.
This rights ontology builds upon the META-SHARE schema for the LanguageResource and the Distribution classes and for the License class builds on the ODRL model.
The ontology defines 4 main classes:
Conditions of use
A language resource is represented with a class instance of the first, which is connected with one or more distributions with the “distribution” object property. Every distribution may have one ore more licenses (dual licensing is permitted) and the license can be further specified with conditions of use.
The language resource is characterized by its availability, which maybe restricted, unrestricted or under negotiation.
The distribution is characterized by its access medium and can be limited depending on the user nature (Whether it is a member of a consortium, or academic user, or commercial user, etc.).
The author of the language resource may have licensed the distribution rights to different persons, consequently there may be one rights holder of the distribution rights per type of distribution.
Finally, the license can belong to a license category and can be further specialized by different conditions of use
These conditions of use include
a)obligations, like Attribution, compensation (that is to say payments) or inform the licensor on further uses.
b)Prohibitions, like making commercial use or redistributions
c) conditions, for example based on the purpose (educationl, evaluation, etc.)
This is a surprisingly flat list of license terms but which can be mapped directly to the existing METASHARE records in XML.
In contrast, Rights Expression Languages (or RELs) define the license features in a structured fashion.
For example, the ccREL (the Creative Commons REL) presents this structure, where a license permits, prohibits or requires different actions. These actions include reproduction, distribution or make derivative works.
The MPEG-21 MEDIA CONTRAct ontology has similar elements, also including the deontic modalities of permission, obligation and prohibition.
Finally, the ODRL core ontology, permits representing licenses as collections of rules.
Again, every rule can be a permission, a prohibition or an obligation, exercised over an asset and possibly by a specific party.
Constraints are generic and include an operator and a right operand.
ODRL 2.1 is a policy and rights expression language suitable to represent the licensing terms of
the language resources. ODRL specifies both an abstract core model and a common vocabulary,
which can be extended for the particular domains ODRL is applied to (like eBooks, mobile devices or the news industry).
ODRL is the most natural choice for expressing licenses and policies in RDF and its expressions can be used within the Rights Information for Language Resources Ontology
However this creates a duality: both the structured ODRL and the flat META-SHARE forms are possible to represent the same license. The slide shows an example of license in Turtle using the ODRL model and the META-SHARE model.
However, a set of SPARQL queries can easily transform one form into the other, as long as the ODRL structures are simple (not nested).
An innovative feature of the Information for Language Resources Ontology is the use of license templates
A license template is a license where some fields are left incomplete.
These licensing patterns are public and inmutable, and can be referenced once and and again, saving verbose licenses.
Thus, two different resources may refer to the same license template with a single triple, while each of them specifies a different price.
The schema of a METASHARE LICENSE is presented in this slide.
In this case, we are representing the meta-share commercila, non redistribution policy.
This is represented by means of a license, which has only one permission. This permission permits making a reproduction, derivative works, commercial use and exert database rights.
However there is a duty (attribution) and a prohibition (further distribution). Further, the permission is constrainted to the purpose of language engineering research, and the location must be at the assignee site.
The same example, this time in turtle, is shown in this slide. It is unabridged.
The main resource in the license is an odrl:Policy (line 02) which has attributed some metadata elements: version (03),
label (04), alternative name (05) or location of the legal code 26 (10). The policy additionally
has information regarding the language and a flat list with the conditions (ms:NoRedistribution,
cc:Attribution, etc. in lines 07-09). The main permission (lines 12-25), which explicitly authorizes for making derivative works,
making commercial use has the duty of attribution (15-17) and the constraints of being used only for
language engineering purposes (lines 18-21) and on the users’ site (lines 21-24). Distribution is forbidden in lines 26-28
And we are concluding the presentation
This paper has presented the Rights Information for Language Resources Ontology, specified in the framework of
the W3C Linked Data for Language Technology Group. It is expected to enhance the accessibility of language resources, to ease the publication of licenses as linked data, and to enable the automatic processing of licenses by web services and other tools.
The URI shown in the slide leads to the ontology specification, grounded with a formal list of requirements and illustrated with examples.
In the future, we expect to improve on the model, especially as regards the user modelling, as well as formalizing constraints for the data structures. Finally, the use of SPARQL queries to move from the flat METASHARE to the ODRL-like policies has to be further document. The same applies to the construct queries capable of filling in automatically thel license templates
Thanks for the attention and we hope to be there in person the next time.