Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Introduction to the FP7 CODE project @ BDBC

775 visualizaciones

Publicado el

The FP7 CODE project will be presented at the Big Data Benchmarking Community call. Here, a high-level overview shall introduce CODEs vision and show the progress after 6-months.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Introduction to the FP7 CODE project @ BDBC

  1. 1. Big Data Benchmarking Community Call CODECommercial Empowered Linked Open Data Ecosystem in Research presented by Florian Stegmaier University of Passau 2012-10-04
  2. 2. Some basic facts...• Budget: 2,4M €, funded by the European Commission• Started in May 2012 with a runtime of 2 years 2
  3. 3. Current situation• Data is being produced in an immense rate: • Evaluation campaigns (e.g., CLEF campaign) • Benchmarking communities (e.g., TPC) • Researchers (e.g., proceedings, journals or slides) Most data remains unstructured and sophisticated access methods are missing! 3
  4. 4. Why is this a problem?!• From a data perspective... • ...what is the quality of data? • to deal with missing values?• From a user perspective... • can i compare this data? baseline? • ...are there contradicting facts? The semantics of documents must be unleashed to make them accessible and processible! 4
  5. 5. The long way to knowledge... 5
  6. 6. Step 1: Analyze data• Analysis of documents has to find: • Structural elements (TOC, images, etc.) • Extract facts and numerical measures • Disambiguate facts (from „string“ to „object“)• Automatic annotation is defective or not complete • Crowdsourced annotation of documents • Marketplace offers revenue for expert knowledge 6
  7. 7. Step 2: Lift and extend data• Extracted and disambiguated data will be lifted into the Linked Data cloud • Interlink with already existent data of the cloud • Enrich data with provenance information (increase quality estimations) • Perform OLAP queries on data cubes (e.g., time series) Enriched and aggregated data is exposed as Linked Data endpoint. 7
  8. 8. Step 3: Interact with data• Query wizard will focus on: • Excel based interaction possibilities • On the fly creation of statistical analyses on a federated dataset• Marketplace encourages users to interact with dataNon-IT (but maybe domain) experts are able to create visual analytics as well as create new data cubes. 8
  9. 9. ...does all this actually work?Current analysis of PDFsis able to discover basictable of contents,reading direction, as wellas specific objects. 9
  10. 10. are the users involved?One possible way toengage users inannotating data is theMendeley Desktop.(early stage) 10
  11. 11. ...what about lifting data?Basic triplificationchain established tolift table based datainto a Semantic Webcompatible datacube. 11
  12. 12. can i find data?The first prototype ofthe query wizard is ableto show and interact withretrieved data in a Excel-like manner. 12
  13. 13. ...and the marketplace?The data can be exposedin several ways, just likein mind maps to help thingsgetting structured.(example shows biggerplate) 13
  14. 14. Thank you for your attention! #CODEresearchEU (Twitter)Thanks to Michael Granitzer, Christin Seifert, Kai Schlegel and Sebastian Bayerl for supporting me with input, figures and slide templates and last butnot least our consortium for the prototype screenshots ;)