Linked lists represent a countable number of ordered values, and are among the most important abstract data types in computer science. With the advent of RDF as a highly expressive knowledge representation language for the Web, various implementations for RDF lists have been proposed. Yet, there is no benchmark so far dedicated to evaluate the performance of triple stores and SPARQL query engines on dealing with ordered linked data. Moreover, essential tasks for evaluating RDF lists, like generating datasets containing RDF lists of various sizes, or generating the same RDF list using different modelling choices, are cumbersome and unprincipled. In this paper, we propose List.MID, a systematic benchmark for evaluating systems serving RDF lists. List.MID consists of a dataset generator, which creates RDF list data in various models and of different sizes; and a set of SPARQL queries. The RDF list data is coherently generated from a large, community-curated base collection of Web MIDI files, rich in lists of musical events of arbitrary length. We describe the List.MID benchmark, and discuss its impact and adoption, reusability, design, and availability.
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
List.MID: A MIDI-Based Benchmark for RDF Lists
1. 1 Knowledge Representation & Reasoning, Computer Science Department
List.MID: A MIDI-Based
Benchmark for RDF Lists
Albert Meroño-Peñuela, Vrije Universiteit Amsterdam
@albertmeronyo
Enrico Daga, The Open University
ISWC 2019, 28 October, Auckland
2. 2
Modelling choices are often left to subjective choice
These practices and their reuse are key in query
performance
Lists are everywhere! Co-authors, timelines, media,
recipes, etc.
Knowledge Representation & Reasoning, Computer Science Department
MOTIVATION
3. 3
And in MIDI applications
Knowledge Representation & Reasoning, Computer Science Department
MOTIVATION
So what do we know about performance of RDF List solutions?
…
[ 144, 60, 100]
[ 128, 60, 64 ]
…
[Pic of music editing software]
5. 5
What RDF list models are common in LOD? How can we generate data
and queries to test their impact at retrieval in a systematic and
principled manner?
C1: Survey of common list modelling practices in RDF and LOD
C2: Benchmarking data and queries enabling their comparison
C3: Evidence of present and future use
Knowledge Representation & Reasoning, Computer Science Department
RESEARCH QUESTIONS & CONTRIBUTIONS
6. 6
Surveyed from:
W3C standards
The Ontology Design Patterns portal
List choices in RDF datasets from ISWC resource track papers
Linked Open Vocabularies (LOV)
LOD Laundromat/LOD-a-lot [Fernández et al. ISWC 2017]
Findings: various practices that generalize to 6 list modelling patterns
Knowledge Representation & Reasoning, Computer Science Department
LIST PATTERNS
7. 7 Knowledge Representation & Reasoning, Computer Science Department
RDF SEQUENCE AND RDF LIST
[SEQ]
[LIST]
10. 10
List.MID is an RDF list benchmark based on:
Collaborative GitHub awesome MIDI list
More than 300K MIDI files from the Web
MIDI Linked Data Cloud [Meroño et al. ISWC 2017]
Consists of:
RDF List data generator on top of real-world list data
Collection of extensible SPARQL list-retrieval queries
Knowledge Representation & Reasoning, Computer Science Department
LIST.MID: A BENCHMARK FOR RDF LISTS
11. 11
Custom version of midi2rdf algorithm [Meroño et al. ESWC 2016]
midi2rdf [-h]
[--format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads,json-ld}]]
[--gz] [--order [{uri,prop_number,prop_time,seq,list,sop}]]
[--version]
filename [outfile]
Knowledge Representation & Reasoning, Computer Science Department
LIST.MID BENCHMARK DATA GENERATOR
Input
MIDI file
List pattern
to use
RDF
serialization
Compress?
12. 12
CQ1. Full list lookup: What is the ordered content of the list?
CQ2. N-th Lookup: Which is the n-th item in the list?
CQ3. Ordered Range: What are the n…m items in the list?
Aimed at supporting use-case LOD publishing
Focus on minimal and atomic operations related to list ordered access
Do not deal (yet) with list management (edit, merge, split, etc.)
Knowledge Representation & Reasoning, Computer Science Department
LIST.MID QUERIES
13. 13
[SEQ] WHERE {:list a midi:Track ; midi:hasEvents [ ?seq ?event ] .
BIND (xsd:integer(SUBSTR(str(?seq), 45)) AS ?index)
} ORDER BY ?index OFFSET <N> LIMIT <M-N+1>
[LIST] SELECT ?event (COUNT(?step) as ?index) WHERE {
:list a midi:Track ; midi:hasEvents ?events . ?events rdf:rest∗ ?step .
?step rdf:rest∗ ?elt . ?elt rdf:first ?event .
} GROUP BY ?event ORDER BY ?index rdf:rest{N}, /…{N}…/
[URI] WHERE { [] a midi:Track ; midi:hasEvent ?event .
BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id) } ORDER BY ?id OFFSET…
[NUM/TIME] WHERE { [] a midi:Track ; midi:hasEvent ?event .
?event midi:absoluteTick ?tick . } ORDER BY ?tick
[SOP] WHERE { [] a midi:Track ; midi:hasEvent ?event . ?event sequence:precedes?
?next_event . ?next_event sequence:follows? ?event .
BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id)
} ORDER BY ?id
Knowledge Representation & Reasoning, Computer Science Department
LIST.MID QUERIES
14. 14
[Daga et al. QuWeDa 2019]
Knowledge Representation & Reasoning, Computer Science Department
EVALUATION: PERFORMANCE EXPERIMENTS
Coherent results among triplestores
Critical impact of different models
rdf:List has poor performance
Property-based
better than link-based
rdf:Seq trade-off
15. 15
N=24 in W3C semantic-web, public-lod, VU, OU
Knowledge Representation & Reasoning, Computer Science Department
EVALUATION: PERFORMANCE EXPERIMENTS
16. 16
Lists are important! But how to assess the impact of their models?
6 common list patterns in RDF
List.MID: Real-world, MIDI-based RDF list benchmark data generator
List.MID: SPARQL queries for minimal and atomic list retrieval operations
First experiments and community interest
Limitations/future work:
Limited set of list operations (e.g. rdf:List could win in e.g. addition)
Extensions to other list models
Recommendations for triplestore optimization implementations
Knowledge Representation & Reasoning, Computer Science Department
CONCLUSIONS
17. 17
Questions, comments, suggestions
most welcome
@albertmeronyo
@enridaga
https://github.com/MIDI-LD/List.MID
Knowledge Representation & Reasoning, Computer Science
Department
THANK YOU
18. 18
• Motivation
• Related Work
• List Pattern Survey
• List.MID Benchmark Data Generator
• List.MID SPARQL Queries
• Evaluation: Performance Experiments
• Evaluation: Survey of Interest
• Conclusions
Knowledge Representation & Reasoning, Computer Science Department
OUTLINE
20. 20
Use this slide to place an image to
the left and text to the right.
With bullet
> Secundary list
To replace the image, right-click the
image (click on it with your other
mouse button), select Change
picture… and choose the new
image.
Knowledge Representation & Reasoning, Computer Science
Department