Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

The Future is Federated

1.281 visualizaciones

Publicado el

Invited talk at #VIVO16

Publicado en: Internet
  • Sé el primero en comentar

The Future is Federated

  1. 1. 
 The future
 is federated Ruben Verborgh
  2. 2. Big Data I think is boring.
  3. 3. Big Data thrives
 on centralization.
  4. 4. Knowledge
 is inherently distributed.
  5. 5. Knowledge
 is inherently heterogeneous.
  6. 6. Knowledge on the Web
 is inherently linked.
  7. 7. Centralization skips interesting the most
 problems
  8. 8. Where to find data you need? How to access them? How to integrate them?
  9. 9. Let’s create smart apps
 over VIVO and Web data.
  10. 10. a light interface to VIVO data queries over that interface an app built on such queries You’ll get to see 3 things:
  11. 11. We can integrate
 multiple data sources
 on the live Web, but we need to set
 our expectations right.
  12. 12. 
 The future
 is federated Big Data fails at Web scale Light interfaces rule Engineer for serendipity
  13. 13. 
 The future
 is federated Big Data fails at Web scale Light interfaces rule Engineer for serendipity
  14. 14. RDFTHE DATA LANGUAGE
  15. 15. <subject> <predicate> <object>. triple
  16. 16. SPARQLTHE QUERY LANGUAGE
  17. 17. SPARQLTHE PROTOCOL
  18. 18. client SPARQL
 endpoint SPARQL protocol SPARQL
 query
  19. 19. SELECT ?person ?name WHERE { ?person a dbo:Scientist. ?person rdfs:label ?name. ?person dbo:birthPlace dbp:Denver. } Hey, SPARQL endpoint… Sure!
  20. 20. SELECT DISTINCT ?drug ?drug1 ?drug2 ?drug3 ?drug4 ?d1 WHERE { ?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban ?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban ?drug3 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban ?drug4 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drugCategory> <http://www4.wiwiss.fu-berlin.de/drugban ?drug1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps1 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr1 . ?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o1 . ?drug2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target> ?o2 . ?o1 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/genbankIdGene> ?g2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/locus> ?l2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/molecularWeight> ?mw2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/hprdId> ?hp2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/swissprotName> ?sn2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/proteinSequence> ?ps2 . ?o2 <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/generalReference> ?gr2 . ?drug <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/target>?o2 . Hey, SPARQL endpoint… Sure!
  21. 21. SPARQL endpoints
 try to be the Web’s
 Big Data processors. for free
  22. 22. few endpoints exist the average endpoint is
 down for 1.5 days/month Can I SPARQL
 your endpoint?
  23. 23. Big Data fails
 at Web scale because Web Scale
 is much bigger.
  24. 24. SEMANTIC
 WEBSHOULDN’T TRY TO COMPETE WITH BIG DATA
  25. 25. WEB I WANT TO PUT THE BACK INTO SEMANTIC WEB IT’S OUR MAIN DIFFERENTIATOR
 FROM BIG DATA
  26. 26. WEB IF IT’S NOT I’M NOT INTERESTED That’s why I think
 Big Data is boring.
  27. 27. 
 The future
 is federated Big Data fails at Web scale Light interfaces rule Engineer for serendipity
  28. 28. AVERAGE
 HUMAN What would the do?
  29. 29. SELECT ?person ?name WHERE { ?person a dbo:Scientist. ?person rdfs:label ?name. ?person dbo:birthPlace dbp:Denver. } AVERAGE
 HUMAN You can use only Wikipedia.
  30. 30. AVERAGE
 HUMAN Which scientists were born in Denver? You can use only Wikipedia.
  31. 31. AVERAGE
 HUMAN 1. visit the page about Denver 2. make a list of people born there 3. read their pages to see if they’re a scientist You can use only Wikipedia.
  32. 32. WEB LINKING
 IS UNIDIRECTIONAL a Denver person’s page links to Denver Denver doesn’t necessarily link to that person
  33. 33. AVERAGE
 HUMAN 1. visit the page about Denver 2. make a list of people born there 3. read their pages to see if they’re a scientist You can use only Wikipedia.
  34. 34. AVERAGE
 HUMAN We need to empower the but please not with a SPARQL endpoint
 because they’re so expensive to keep up.
  35. 35. SIMPLEST
 COMPLEXITY WHAT IS THE ?
  36. 36. THE ESSENCE
 OF RDF <subject> <predicate> <object>.
  37. 37. THE ESSENCE
 OF LINKED DATA ?subject <predicate> <object>.
  38. 38. THE ESSENCE
 OF LINKED DATA Denver <predicate> <object>.
  39. 39. THE ESSENCE
 OF TPF ?subject ?predicate ?object.
  40. 40. THE ESSENCE
 OF TPF ?subject ?predicate Denver.
  41. 41. TRIPLE
 PATTERN
 FRAGMENTS
  42. 42. Clients can ask
 the server only
 for triple patterns.
  43. 43. AVERAGE
 HUMAN Which scientists were born in Denver? You can only use a TPF interface of DBpedia.
  44. 44. AVERAGE
 HUMAN 1. “?people birthPlace Denver.” 2. “?person type Scientist.” 3. “?person fullName ?name.” You can only use a TPF interface of DBpedia.
  45. 45. AVERAGE
 MACHINE 1. “?person birthPlace Denver.” 2. “?person type Scientist.” 3. “?person fullName ?name.” You can only use a TPF interface of DBpedia.
  46. 46. SELECT ?person ?name WHERE { ?person a dbo:Scientist. ?person rdfs:label ?name. ?person dbo:birthPlace dbp:Denver. } AVERAGE
 MACHINE You can only use a TPF interface of DBpedia.
  47. 47. 
 The future
 is federated Big Data fails at Web scale Light interfaces rule Engineer for serendipity
  48. 48. Engineer for serendipity. —Roy T. Fielding
  49. 49. If 1 endpoint is down
 for 1.5 days each month, then 2 endpoints might be
 for 3 days each month. Federated queries with
 SPARQL endpoints
 pose a problem.
  50. 50. Just ask each of the questions
 to different TPF servers. Federated queries are
 native to TPF clients.
  51. 51. But in federated scenarios,
 performance can be on par
 with SPARQL endpoints! TPF trades server cost
 for query performance.
  52. 52. TPF is not the final solution —no API will ever be— but an excellent starting point. Lightweight interfaces
 are easy to extend and combine with others.
  53. 53. The Memento protocol
 brings time to the Web. Ask for representations at a certain point in the past.
  54. 54. TPF and Memento
 are a great match. We combined them in collaboration
 with Herbert Van de Sompel & team
 at the Los Alamos National Laboratory.
  55. 55. 
 The future
 is federated Big Data fails at Web scale Light interfaces rule Engineer for serendipity
  56. 56. VIVO 
 
 
 client SPARQL VIVO today TPF
 server
  57. 57. VIVO 
 
 
 client TPF VIVO tomorrow?
  58. 58. Federation is a game changer.
  59. 59. Federation is a game changer. with the TPF interface
  60. 60. power With great responsibility comes great
  61. 61. realistic We need expectations about our to be
  62. 62. Some queries will
 always be hard
 on an open Web. You might need centralization
 if you want answers fast. * *Terms and conditions apply.
  63. 63. …and streaming! Many more queries
 than you’d think
 are pretty fast…
  64. 64. OPEN SOURCE linkeddatafragments.org
  65. 65. @RubenVerborgh 
 and it
 starts today 
 The future
 is federated

×