Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Information-Rich Programming in F# with Semantic Data

2.526 visualizaciones

Publicado el

Programming with rich data frequently implies that one
needs to search for, understand, integrate and program with
new data - with each of these steps constituting a major
obstacle to successful data use.

In this talk we will explain and demonstrate how our approach,
LITEQ - Language Integrated Types, Extensions and Queries for
RDF Graphs, which is realized as part of the F# / Visual Studio-
environment, supports the software developer. Using the extended
IDE the developer may now

a. explore new, previously unseen data sources,
which are either natively in RDF or mapped into RDF;
b. use the exploration of schemata and data in order to
construct types and objects in the F# environment;
c. automatically map between data and programming language objects in
order to make them persistent in the data source;
d. have extended typing functionality added to the F#
environment and resulting from the exploration of the data source
and its mapping into F#.

Core to this approach is the novel node path query language, NPQL,
that allows for interactive, intuitive exploration of data schemata and
data proper as well as for the mapping and definition
of types, object collections and individual objects.
Beyond the existing type provider mechanism for F#
our approach also allows for property-based navigation
and runtime querying for data objects.

Publicado en: Tecnología, Educación

Information-Rich Programming in F# with Semantic Data

  1. 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Information-Rich Programming in F# with Semantic Data
  2. 2. Linked Open Data Cloud Where’s the Data in the Big Data Wave? Gerhard Weikum SIGMOD Blog, 6.3.2013 http://wp.sigmod.org/ … the Web of Linked Data consisting of more than 30 Billion RDF triples from hundreds of data sources … WeST Steffen Staab staab@uni-koblenz.de 2
  3. 3. Some „Bubbles“ of the LOD Cloud WeST Steffen Staab staab@uni-koblenz.de 3
  4. 4. RDF: Simple Foundations WeST Steffen Staab staab@uni-koblenz.de 4
  5. 5. Example RDF Graph Native Graph OR R2RML: RDB to RDF Mapping Language (W3C rec) WeST Steffen Staab staab@uni-koblenz.de 5
  6. 6. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) against SPARQL Understandability Ease of use SchemEX Construction of schema-based index Schema induction WeST Steffen Staab staab@uni-koblenz.de 6
  7. 7. Programming against unknown data source Exploring a data source WeST Steffen Staab staab@uni-koblenz.de Using a data source 7
  8. 8. Example application • Goal: Application that helps to collect dog license fee • Send Email reminders to dog owners • Data is given as RDF graph WeST Steffen Staab staab@uni-koblenz.de 8
  9. 9. Programmer‘s Task 1: Schema Exploration Schema exploration & Identification of important RDF types • Find RDF types representing dogs and persons WeST Steffen Staab staab@uni-koblenz.de 9
  10. 10. Naive Approach Task 1: Schema Exploration Schema exploration & Identification of important RDF types • Find RDF types representing dogs and persons Tooling for Naïve Approach: SPARQL Query Formulation WeST Steffen Staab staab@uni-koblenz.de 10
  11. 11. Programmer‘s Task 2: Code Type Creation Code type creation in host language • Convert the identified dog and person RDF types to code types in the host language type exCreature(uri) = class member this.hasName : String = … Member this.hasAge : int = … end type exDog(uri) = class inherit exCreature(uri) member this.hasOwner : exPerson = … member this.TaxNo : Integer = … end type exPerson(uri) = class inherit exCreature(uri) end WeST Steffen Staab staab@uni-koblenz.de 11
  12. 12. Programmer‘s Task 3: Data querying Data querying • Write a query that returns all dog owners WeST Steffen Staab staab@uni-koblenz.de 12
  13. 13. Naive Approach Task 3: Data querying Data querying • Write a query that returns all dog owners Tooling for Naive Approach: SPARQL Query formulation WeST Steffen Staab staab@uni-koblenz.de 13
  14. 14. Naive Approach Task 4: Object manipulation Create the objects, manipulate them & make them persistent • Develop functionality around query to send reminder let queryString = “SELECT ?owner WHERE { ?dog rdf:type exDog. ?dog ex:hasOwner ?owner }“ dbConnection.evaluate(queryString) |> Seq.iter ( fun uri -> let p = new Person(uri) sendReminderEmail(p) ) WeST Steffen Staab staab@uni-koblenz.de 14
  15. 15. The LITEQ approach WeST Steffen Staab staab@uni-koblenz.de 15
  16. 16. Node Path Query Language WeST Steffen Staab staab@uni-koblenz.de 16
  17. 17. Graph Traversal with NPQL: Subtype Navigation > NPQL rdf:Resource > ex:Creature WeST Steffen Staab staab@uni-koblenz.de 17
  18. 18. Graph Traversal with NPQL: Property Navigation . NPQL ex:Dog . ex:hasOwner WeST Steffen Staab staab@uni-koblenz.de 18
  19. 19. Extensional Semantics: Task 3 – Querying for Owners NPQL rdf:Resource > ex:Dog ex:Creature > ex:Dog . ex:hasOwner -> Extension • Select ex:Dog • Walk through ex:hasOwner to ex:Person • Use extension to retrieve all persons who own dogs: ex:Bob WeST Steffen Staab staab@uni-koblenz.de 19
  20. 20. Intensional Semantics: Task 2 - Creating Person Code Type NPQL rdf:Resource > ex:Creature > ex:Dog.hasOwner -> Intension • Select ex:Person node • “Intension” to get code type based on rdf type type exCreature(uri) = class member this.hasName : String = … Member this.hasAge : int = … end type exPerson(uri) = class inherit exCreature(uri) WeST Steffen Staab end staab@uni-koblenz.de 20
  21. 21. Autocompletion Semantics: Task 1 - Exploration NPQL rdf:Resource > ex:Creature > Suggestions during query writing • Instances based on extensional semantics • Types & Props based on intensional semantics ex:Person, ex:Dog WeST Steffen Staab staab@uni-koblenz.de 21
  22. 22. Extensional Semantics: LA Conjunctive Queries NPQL ex:Dog <- ex:hasOwner Left associative conjunctive query with projection WeST Steffen Staab staab@uni-koblenz.de 22
  23. 23. Host Language Extension: Task 4 – Create Objects Create the objects, manipulation & persistence • Develop the functionality around the query that will send the reminder using LITEQ in F# Preliminary Implementation in F# http://west.uni-koblenz.de/Research/systems/liteq WeST Steffen Staab staab@uni-koblenz.de 23
  24. 24. Web Science & Technologies University of Koblenz ▪ Landau, Germany Live demo of LITEQ in Visual Studio/F#
  25. 25. Related Work Task LINQ XML Freebase Type Type Provider Provider LITEQ current version LITEQ Concept 1 Schema exploration - (✔) per doc (✔) only trees ✔ ✔ 2 Code type creation - (✔) erased types? (✔) erased types (✔) erased types ✔ full hierarchy ✔ - ((✔)) very limited expressiv. (✔) limited expressiv. ✔ no full SPARQL (✔) ✔ - ✔ no new object creation ✔ 3 Data querying 4 Object manipulation & persistence WeST Steffen Staab staab@uni-koblenz.de 26
  26. 26. Future work wrt LITEQ • Current implementation is a prototype • Current implementation uses erased types  At runtime, no type hierarchy is present • Switch to generated types in the future  Higher expressiveness in the host language exploiting type hierarchy • Optimizations of LITEQ implementation necessary • Lazy evaluation • Distinguish between design time and runtime • Not all types created at design time are needed at runtime • Formalize query language and investigate expressiveness WeST Steffen Staab staab@uni-koblenz.de 27
  27. 27. Challenge: Joint Type Inference Data modeling world Description Logics Program modeling world ML type inference RDF UML class diagrams WeST Steffen Staab staab@uni-koblenz.de 28
  28. 28. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) vs. SPARQL Understandability Ease of use SchemEX Where do I find relevant data? Efficient construction of a schema-level index WeST Steffen Staab staab@uni-koblenz.de 29
  29. 29. Preliminary Evaluation of LITEQ/NPQL Focused on NPQL • Reason: Test subjects lacked knowledge of F# and functional programming for evaluating LITEQ in full • Comparing NPQL against SPARQL Main Hypothesis of Evaluation • NPQL with autocompletion allows for effective query writing in more efficient manner than SPARQL Thus: some of the advantages of LITEQ cannot show up in the evaluation! WeST Steffen Staab staab@uni-koblenz.de 30
  30. 30. Evaluation Subjects Evaluation with 11 participants • 1 subject a posteriori eliminated from analysis of evaluation, because he could not deal with SPARQL at all! • 10 subjects remaining for analysis Participants • Undergraduate students • PhD students • PostDocs WeST Steffen Staab staab@uni-koblenz.de 31
  31. 31. Evaluation - Setup 1. Pre-questionaire 1. Training in RDF, SPARQL & NPQL 1. Experimental tasks to be solved by subjects 1. Post-questionaire WeST Steffen Staab staab@uni-koblenz.de 32
  32. 32. Phase 1: Pre-Questionnaire – Knowledge & skills • Programming: All • Object-orientation: 8 • Functional programming:  “Intermediate” or above  “Intermediate” or above 4 Intermediate” or above Lisp, Haskell, F# (once) 4 none” • .NET 1 Expert” 2 Beginner” 7 none” • SPARQL: 3 Intermediate” or above 7 below “intermediate” WeST Steffen Staab staab@uni-koblenz.de [Sparql Experts] [Sparql Novices] 33
  33. 33. Phase 2: Training in RDF, SPARQL, NPQL Training in RDF & SPARQL • Presentation of RDF & SPARQL (20 minutes) • Practical excercise writing SPARQL queries in the Web interface (5 minutes) Training in NPQL • Practical excercise writing NPQL queries in Visual Studio (5 minutes) WeST Steffen Staab staab@uni-koblenz.de 34
  34. 34. Phase 3: Solving experimental tasks by subjects 9 different experimental tasks to solve • Half of tasks in NPQL using Visual Studio • Other half using SPARQL and a web interface Task types • Navigation and exploration of a data source (Task 1) • Retrieving and answering questions about the data (Task 3) • 2 tasks were not solvable in NPQL • Investigating how users deal with limits of the language Evaluation measure: • Duration to complete each task WeST Steffen Staab staab@uni-koblenz.de 35
  35. 35. Evaluation across different user types WeST Steffen Staab staab@uni-koblenz.de 36
  36. 36. Evaluations per Task WeST Steffen Staab staab@uni-koblenz.de 37
  37. 37. Phase 4: Post-Questionnaire “Do you want to explore a data source in your IDE?” 4 yes” 3 no, prefer separation of steps” 3 no preference” “NPQL is easier to use than SPARQL” 7 agree” or above My conclusion Other Though LITEQ is still in a pre-alpha status, • Better supportadvantages queries in SPARQL when writing became visible in times for interactive working with • Better responsepreliminary user evaluation NPQL WeST Steffen Staab staab@uni-koblenz.de 38
  38. 38. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) against SPARQL Understandability Ease of use SchemEX Construction of schema-based index Schema induction WeST Steffen Staab staab@uni-koblenz.de 39
  39. 39. Searching the LOD cloud SELECT ?x foaf:Document WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . x ?y rdf:type fb:Computer_Scientist } ? WeST Steffen Staab staab@uni-koblenz.de 40 swrc:InProceedings fb:Computer_Scientist dc:creator
  40. 40. Searching the LOD cloud SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist } Index WeST Steffen Staab staab@uni-koblenz.de 41 • ACM • DBLP
  41. 41. Schema-level index Schema information on LOD Explicit Implicit Assigning class types Modelling attributes Class rdf:type Property Entity 2 Entity Entity WeST Steffen Staab staab@uni-koblenz.de 42
  42. 42. Schema-level index C1 C3 C2 P1 DS1 C1 P2 C3 C2 P1 E1 P2 WeST E2 XYZ Steffen Staab staab@uni-koblenz.de DS1 43
  43. 43. Typecluster  Entities with the same Set of types C1 C2 ... Cn ... DSm TCj DS1 WeST Steffen Staab staab@uni-koblenz.de DS2 44
  44. 44. Typecluster: Example foaf:Document swrc:InProceedings tc2309 DBLP WeST Steffen Staab staab@uni-koblenz.de ACM 45
  45. 45. Bi-Simulation  Entities are equivalent, if they refer with the same attributes to equivalent entities  Restriction: 1-Bi-Simulation P1 P2 ... Pn ... DSm BSi DS1 WeST Steffen Staab staab@uni-koblenz.de DS2 46
  46. 46. Bi-Simulation: Example dc:creator bs2608 BBC WeST Steffen Staab staab@uni-koblenz.de DBLP 47
  47. 47. SchemEX: Combination TC and Bi-Simulation  Partition of TC based on 1-Bi-Simulation with restrictions on the destination TC Schema C1 Payload ... Cn C45 C2 TCj ... Cn„ TCk BSi EQC WeST C2 DS1 EQCj DS2 P1 P2 ... Pn ... DSm Steffen Staab staab@uni-koblenz.de EQC DS 48
  48. 48. SchemEX: Example foaf:Document swrc:InProceedings fb:Computer_Scientist tc2309 tc2101 bs260 8 eqc707 DBLP WeST Steffen Staab staab@uni-koblenz.de dc:creator ... SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings ?x dc:creator ?y . ?y rdf:type fb:Computer_Scient } 49
  49. 49. SchemEX: Computation  Precise computation: Brute-Force Schema C1 Payload ... Cn C12 C2 TCj ... Cn„ TCk BSi EQC WeST C2 DS1 EQCj DS2 P1 P2 ... Pn ... DSm Steffen Staab staab@uni-koblenz.de EQC DS 50
  50. 50. Stream-based Computation of SchemEX  LOD Crawler: Stream of n-Quads (triple + data source) … Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1 FiFo 1 C3 4 C2 3 6 4 2 C2 2 3 1 5 C1 WeST Steffen Staab staab@uni-koblenz.de 51
  51. 51. Quality of Approximated Index Stream-based computation vs. brute force Data set of 11 Mio. tripel WeST Steffen Staab staab@uni-koblenz.de 52
  52. 52. SchemEX @ BTC 2011 SchemEX Allows complex queries (Star, Chain) Scalable computation High quality Index over BTC 2011 data 2.17 billion tripel Index: 55 million tripel Commodity hardware VM: 1 Core, 4 GB RAM Throughput: 39.500 tripel / second Computation of full index: 15h WeST Steffen Staab staab@uni-koblenz.de 53
  53. 53. Future work wrt SchemEX Further exploration of • schema induction • query federation Federation vs Link Traversal based query execution • Granularity of query execution • Too fine grained: URI dereferencing • Too expressive: SPARQL • Sweet spot -> NPQL?? WeST Steffen Staab staab@uni-koblenz.de 54
  54. 54. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) against SPARQL Understandability Ease of use SchemEX Construction of schema-based index Schema induction WeST Steffen Staab staab@uni-koblenz.de 55
  55. 55. Future 1. 2. 3. 4. Searching for distributed data Understanding distributed data Intelligent queries on distributed data Programming with distributed data • Type reuse • Type induction WeST Steffen Staab staab@uni-koblenz.de 56
  56. 56. Web Science & Technologies University of Koblenz ▪ Landau, Germany Thank you for your attention!

×