Victor Rodriguez-Doncel
Ontology Engineering Group
Universidad Politécnica de
Madrid
Penny Labropoulou
Institute for Langu...
XML
RDFhttps://creativecommons.org/licenses/?lang=en
Advantages of digital (RDF) licensing
 Better understanding of the licensing terms by
human users
 Allows processing of ...
License as metadata
License as metadata in
catalogues
Some catalogues
 OLAC
 META-SHARE
 CLARIN
 CLARIN Virtual Language Observatory
 Datahub.io
 LREMap
 Linghub
Licensing info is metadata
 Licensing info as free text
 Licensing info as a choice among several
possibilities
 Licens...
Rights information in the META-
SHARE
model
Stelios Piperidis. 2012. The META-SHARE language resources sharing
infrastruct...
Rights information in the META-
SHARE
model
META-SHARE metadata model
Rights Information for Language
Resources Ontology
http://purl.org/NET/ms-rights#
Other RELs: ccREL
Other RELs: MPEG-21 MCO
The ODRL Core Ontology
:example0
a odrl:Set;
odrl:permission [
odrl:target :langResource ;
odrl:action odrl:reproduce
] ;
odrl:prohibition [
odrl...
License templates
Use of license templates
01<http://purl.org/NET/rdflicense/ms-c-nored>
02 a odrl:Policy ;
03 dct:hasVersion "1.0" ;
04 rdfs:label "META-SHARE Comme...
Rights information for language
Resources
http://purl.org/NET/ms-rights#
Ontology specification, grounded with a formal li...
Victor Rodriguez-Doncel
vrodriguez@fi.upm.es
Penny Labropoulou
penny@ilsp.gr
-- Thanks --
4th Workshop on Linked Data in L...
Digital representation of rights for language resources
Digital representation of rights for language resources
Digital representation of rights for language resources
Digital representation of rights for language resources
Digital representation of rights for language resources
Digital representation of rights for language resources
Digital representation of rights for language resources
Próxima SlideShare
Cargando en…5
×

Digital representation of rights for language resources

419 visualizaciones

Publicado el

Digital Representation of Rights for Language Resources, V. Rodriguez-Doncel, P. Labropoulou in Proc. of the 4th Workshop on Linked Data in Linguistics: Resources and Applications, collocated with ACL-IJCNPL. (2015)
Beijing, China, 31st July 2015.

Publicado en: Ciencias
0 comentarios
0 recomendaciones
Estadísticas
Notas
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Sin descargas
Visualizaciones
Visualizaciones totales
419
En SlideShare
0
De insertados
0
Número de insertados
8
Acciones
Compartido
0
Descargas
2
Comentarios
0
Recomendaciones
0
Insertados 0
No insertados

No hay notas en la diapositiva.
  • Good afternoon and welcome to this session.

    This is a recording presenting the work “Digital Representation of Rights for Language Resources”, by VRD (speaking now) and PL.

    This work is the result of a joint effort of UPM (Madrid, Spain) and the ILSP (Athens, Greece), in tight cooperation with the EU funded LIDER Porject and the W3C LD4LT community group, heavily working with resources and licenses from the META Share network.

    In this presentation, it will be shown how the most commonly used licenses for language resources can be digitally represented, reusing existing vocabularies and extending the Open Digital Rights Language core model.

    (pausa)
    The most important information of Licenses will be represented as RDF.
    (pausa)

    Practical examples and guidelines of use the ‘Rights Information for Language Resources’ vocabulary will be given.
  • Under the umbrella term “Language Resources”, we find a number of different items like dictionaries, lexicons, thesauri, semantic networks, written and spoken corpora, pos taggers, syntantic analyzers, ontologies, term banks, phonetic databases…

    Some of these heterogeneous resources are actually databases, some are pieces of software, some are pure creative works.




    https://www.jasondavies.com/wordcloud/#
    language resource
    language resource
    language resource
    lexicon
    lexicon
    lexicon
    spoken corpus
    spoken corpus
    term bank
    term bank
    term bank
    dictionary
    dictionary
    semantic network
    semantic network
    phonetic databases
    phonetic databases
    thesaurus
    thesaurus
    application
    application
    POS tagger
    POS tagger
    Porter stemmer
    Porter stemmer
    aligned corpus
    aligned corpus
    knowledge source
    knowledge source
    n-gram model
    n-gram model
    written corpus
    written corpus
    ontology
    ontology
    tokenizer
    tokenizer
    concordancer
    concordancer
    word sense disambiguation service
    word sense disambiguation service
    translation web service
    translation web service
    morphology analyzer
    morphology analyzer
    syntatic analyzer
    syntatic analyzer
    language resource
    a
    b
    c
    d
    e
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z
  • The production of these resources in occasions require an important effort which is acknowledged by almost every legal system.
    In general LANGUAGE RESOURCES are protected by the law.

    Most of the resources qualify to be intellectual property works and as such protected by copyright. Some receive full protection as works (like ontologies, or computer programs), some a reduced protection as databases. Language resources are not, however, pantentable.

    In any case, for their safe consumption, rights information must be present. In particular, licenses determine which uses are allowed under which conditions.
  • Actually, one of the priorities set by the FLARENET Strategic Research Agenda is the availability of LRs “within an adequate IPR and legal framework”. The recommendations include the elaboration of “specific, simple and harmonised licensing solutions for data resources”, taking into account licensing schemes already in use and the adoption of electronic licensing and adaptation of current distribution models to new media

    The most important parts of a license can be represented in a digital form as XML or RDF.

    This is not a new idea, and was also supported by the Creative Commons licenses, which include a legal code, a summary and a purely electronic version.

  • There is a number of advantages when digital licensing is used

    1 Improvement of the understanding of the licensing terms by human users
    Although licenses are texts in natural language, the legal jargon may not be easily understood by newcomers. A harmonised vocabulary for licensing terms favours universal understanding of their precise meaning

    2. Digital licensing allows the Processing of the licensing terms by machines
    Computer programs can take decisions based on the license, like selectively granting access or permitting or denying the combination of differently licensed resources.

    3.Enhancement of the search and discovery of LRs. Query by licensing terms is possible, for example limiting a search to resources where the license allows commercial use, creation of derivative works etc

    4. Finally, better management, preservation and interoperation of the LRs by publishers, who have a clearer account on which rights have been granted to which resources

  • The license is a piece of metadata describing a language resource, like the authorship or the creation date
  • Language resources are sometimes collected in catalogs, and in this case licensing metadata appears within the resource but also as a metadata record in the catalog. This second case is more important.
    Indeed, this duplicity of information may lead to inconsistencies, which would be minor if rights were represented digitally in a uniform or interoperable manner.
  • These language resource catalogs are actually databases with the metadata of the resources.
    All of them handle licensing information, as it is crucial for the safe consumption of resources in industrial settings.
    Some data catalogs are the OLAC Language Resource Catalog, META-SHARE, CLARIN, CLARIN Virtual Language Observatory, Datahub.io, LREMap, Linghub





  • We have studied how each of these respositories handle the licensing information, finding three possible scenarios.

    Licensing info as free text
    catalogs where the rights information is loosely represented as a free text metadata element: this is mainly the case for portals harvesting from various sources, such as OLAC, the LRE Map and the CLARIN Virtual Lanuage Observatory (VLO)
    Licensing info as a choice among several possibilities
    META-SHARE and partly Datahub and the CLARIN network repositories
    Licensing info as a more complex rights expression
    The META-SHARE ontology defined a richer set of combinable elements to build licenses.

    The latter is the most complete option. For example, faceted browsing with the criterion of access rights/ license is a feature integrated in most of the catalogs mentioned before. But in the case of META-SHARE, faceted browsing with a filter for con
    ditions of use is allowed (e.g. whether the license allows commercial use, derivatives etc.)
  • We find this META-SHARE repo as the most interesting case from a licensing point of view.
    The META-SHARE network includes 13 resource repositories, with over 1200 resource packages.

    The META-SHARE (MS) metadata schema constitutes an essential ingredient of the META-SHARE infrastructure.
    META-SHARE creates a “space” within which different LRs may be shared under specific licensing terms


  • The original abstract META-SHARE metadata model was first implemented as an XML Schema.

    A resource in META-SHARE IS described with an XML document adhering to that schema. Some elements are obligatory (minimal version), some recommended and some optional (having thus a maximal version).

    The model contains 5 types of entities, where the most important one was the resource, specialized by the langresource.
    The language resource contained information on the resource, on the version, other metadata and also DISTRIBUTION INFO.

    The core of the schema is the resourceInfo component, which includes administrative components relevant to all LRs, like idenification, usage info or media type.

    Licenses are root entities, but licensing terms are present as well as features of the distribution info: its availability, detailed conditions or the IPR holder

    The META-SHARE Metadata model has been ported to OWL/RDF by Marta Villegas, and has also largely influenced the ontology presented in the session before.




  • The work presented today proposes a model specifically addressing the licensing elements in a more standardized manner.

    This rights ontology builds upon the META-SHARE schema for the LanguageResource and the Distribution classes and for the License class builds on the ODRL model.
  • The ontology defines 4 main classes:
    Language resource
    Distribution
    License
    Conditions of use
    A language resource is represented with a class instance of the first, which is connected with one or more distributions with the “distribution” object property. Every distribution may have one ore more licenses (dual licensing is permitted) and the license can be further specified with conditions of use.
  • The language resource is characterized by its availability, which maybe restricted, unrestricted or under negotiation.
  • The distribution is characterized by its access medium and can be limited depending on the user nature (Whether it is a member of a consortium, or academic user, or commercial user, etc.).

    The author of the language resource may have licensed the distribution rights to different persons, consequently there may be one rights holder of the distribution rights per type of distribution.
  • Finally, the license can belong to a license category and can be further specialized by different conditions of use
    These conditions of use include
    a)obligations, like Attribution, compensation (that is to say payments) or inform the licensor on further uses.
    b)Prohibitions, like making commercial use or redistributions
    c) conditions, for example based on the purpose (educationl, evaluation, etc.)

    This is a surprisingly flat list of license terms but which can be mapped directly to the existing METASHARE records in XML.
  • In contrast, Rights Expression Languages (or RELs) define the license features in a structured fashion.
    For example, the ccREL (the Creative Commons REL) presents this structure, where a license permits, prohibits or requires different actions. These actions include reproduction, distribution or make derivative works.
  • The MPEG-21 MEDIA CONTRAct ontology has similar elements, also including the deontic modalities of permission, obligation and prohibition.
  • Finally, the ODRL core ontology, permits representing licenses as collections of rules.
    Again, every rule can be a permission, a prohibition or an obligation, exercised over an asset and possibly by a specific party.
    Constraints are generic and include an operator and a right operand.

    ODRL 2.1 is a policy and rights expression language suitable to represent the licensing terms of
    the language resources. ODRL specifies both an abstract core model and a common vocabulary,
    which can be extended for the particular domains ODRL is applied to (like eBooks, mobile devices or the news industry).

    ODRL is the most natural choice for expressing licenses and policies in RDF and its expressions can be used within the Rights Information for Language Resources Ontology
  • However this creates a duality: both the structured ODRL and the flat META-SHARE forms are possible to represent the same license. The slide shows an example of license in Turtle using the ODRL model and the META-SHARE model.

    However, a set of SPARQL queries can easily transform one form into the other, as long as the ODRL structures are simple (not nested).






  • An innovative feature of the Information for Language Resources Ontology is the use of license templates

    A license template is a license where some fields are left incomplete.
    These licensing patterns are public and inmutable, and can be referenced once and and again, saving verbose licenses.
  • Thus, two different resources may refer to the same license template with a single triple, while each of them specifies a different price.
  • The schema of a METASHARE LICENSE is presented in this slide.
    In this case, we are representing the meta-share commercila, non redistribution policy.

    This is represented by means of a license, which has only one permission. This permission permits making a reproduction, derivative works, commercial use and exert database rights.
    However there is a duty (attribution) and a prohibition (further distribution). Further, the permission is constrainted to the purpose of language engineering research, and the location must be at the assignee site.

  • The same example, this time in turtle, is shown in this slide. It is unabridged.
    The main resource in the license is an odrl:Policy (line 02) which has attributed some metadata elements: version (03),
    label (04), alternative name (05) or location of the legal code 26 (10). The policy additionally
    has information regarding the language and a flat list with the conditions (ms:NoRedistribution,
    cc:Attribution, etc. in lines 07-09). The main permission (lines 12-25), which explicitly authorizes for making derivative works,
    making commercial use has the duty of attribution (15-17) and the constraints of being used only for
    language engineering purposes (lines 18-21) and on the users’ site (lines 21-24). Distribution is forbidden in lines 26-28
  • And we are concluding the presentation
    This paper has presented the Rights Information for Language Resources Ontology, specified in the framework of
    the W3C Linked Data for Language Technology Group. It is expected to enhance the accessibility of language resources, to ease the publication of licenses as linked data, and to enable the automatic processing of licenses by web services and other tools.
    The URI shown in the slide leads to the ontology specification, grounded with a formal list of requirements and illustrated with examples.

    In the future, we expect to improve on the model, especially as regards the user modelling, as well as formalizing constraints for the data structures. Finally, the use of SPARQL queries to move from the flat METASHARE to the ODRL-like policies has to be further document. The same applies to the construct queries capable of filling in automatically thel license templates
  • Thanks for the attention and we hope to be there in person the next time.
  • Digital representation of rights for language resources

    1. 1. Victor Rodriguez-Doncel Ontology Engineering Group Universidad Politécnica de Madrid Penny Labropoulou Institute for Language and Speech Processing Athena RC Athens Digital Representation of Rights for Language Resources 4th Workshop on Linked Data in Linguistics: Resources and Applications Beijing, China, 31st July 2015. Co-located with ACL-IJCNLP 2015
    2. 2. XML RDFhttps://creativecommons.org/licenses/?lang=en
    3. 3. Advantages of digital (RDF) licensing  Better understanding of the licensing terms by human users  Allows processing of the licensing terms by machines  Enhances of the search and discovery of LRs  Allows easier management of the LRs by publishers
    4. 4. License as metadata
    5. 5. License as metadata in catalogues
    6. 6. Some catalogues  OLAC  META-SHARE  CLARIN  CLARIN Virtual Language Observatory  Datahub.io  LREMap  Linghub
    7. 7. Licensing info is metadata  Licensing info as free text  Licensing info as a choice among several possibilities  Licensing info as a more complex rights expression
    8. 8. Rights information in the META- SHARE model Stelios Piperidis. 2012. The META-SHARE language resources sharing infrastructure: Principles, challenges, solutions
    9. 9. Rights information in the META- SHARE model
    10. 10. META-SHARE metadata model
    11. 11. Rights Information for Language Resources Ontology http://purl.org/NET/ms-rights#
    12. 12. Other RELs: ccREL
    13. 13. Other RELs: MPEG-21 MCO
    14. 14. The ODRL Core Ontology
    15. 15. :example0 a odrl:Set; odrl:permission [ odrl:target :langResource ; odrl:action odrl:reproduce ] ; odrl:prohibition [ odrl:target :langResource ; odrl:action odrl:derive, odrl:commercialize ] . :langResource :distribution :myDistribution . :myDistribution :license :myLicense . :myLicense :conditionsOfUse :noDerivatives , :nonCommercialUse . SPARQL CONSTRUCTQueries
    16. 16. License templates
    17. 17. Use of license templates
    18. 18. 01<http://purl.org/NET/rdflicense/ms-c-nored> 02 a odrl:Policy ; 03 dct:hasVersion "1.0" ; 04 rdfs:label "META-SHARE Commercial NoRedistribution" ; 05 dct:alternative "MS C-NoReD" ; 06 dct:language <http://www.lexvo.org/page/iso639-3/eng> ; 07 ms:conditionsOfUse ms:noRedistribution, cc:Attribution, 08 cc:CommercialUse, ms:conditionsOfUse, 09 ms:languageEngineeringResearch ; 10 cc:legalcode <http://www.meta-net.eu/meta-s[...etc...].pdf> . 11 ms:licenseCategory ms:PUB ; 12 odrl:permission [ 13 odrl:action cc:Reproduction, cc:DerivativeWorks , odrl:extract, 14 odrl:aggregate, cc:CommercialUse ; 15 odrl:duty [ 16 odrl:action cc:Attribution ; 17 ] ; 18 odrl:constraint [ 19 odrl:operator odrl:eq ; 20 odrl:purpose ms:languageEngineeringResearch 21 ] , [ 22 odrl:operator odrl:eq ; 23 odrl:spatial "only at assignee’s site" 24 ] 25 ]; 26 odrl:prohibition [ 27 odrl:action cc:Distribution ; 28 ] .
    19. 19. Rights information for language Resources http://purl.org/NET/ms-rights# Ontology specification, grounded with a formal list of requirements and illustrated with examples
    20. 20. Victor Rodriguez-Doncel vrodriguez@fi.upm.es Penny Labropoulou penny@ilsp.gr -- Thanks -- 4th Workshop on Linked Data in Linguistics: Resources and Applications Beijing, China, 31st July 2015. Co-located with ACL-IJCNLP 2015

    ×