SlideShare una empresa de Scribd logo
1 de 48
Steffen Staab Programming with Semantic Broad Data 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Programming with
Semantic Broad Data
Steffen Staab
@ststaab
west.uni-koblenz.de
Steffen Staab Programming with Semantic Broad Data 2
The World of Big Data – Volume & Velocity
Genome data
• Up to 200 GB/person
Video data
• Upload 300 hrs/min
Sensor data
• 5000 sensors/jet
engine
• 1 Tera bit/s
360 TB/disc
https://flic.kr/p/8zuDTm
https://flic.kr/p/59jc2h
Steffen Staab Programming with Semantic Broad Data 3
The World of Big Data – Volume & Velocity
Genome data
• Up to 200 GB/person
Video data
• Upload 300 hrs/min
Sensor data
• 5000 sensors/jet
engine
• 1 Tera bit/s
https://flic.kr/p/8zuDTm
https://flic.kr/p/59jc2h
18 concepts
Noise
amplitudes
Steffen Staab Programming with Semantic Broad Data 4
The World of Big Data – Variety
Data models
• Graph data
• Relational
• XML
• RDF
• CSV
• JPEG
• MPEG-1, 2, 4
• Dicom
• PDF
• Excel
• ...
Conceptual models
aka ER schemata
aka Logical schemata
aka XML schemata
aka RDFS / OWL ontologies
Foaf, Dublin Core, Marc81,
Unifact,.....
Dozens - Hundreds "¥"
Steffen Staab Programming with Semantic Broad Data 5
The World of Big Data – Variety – 15 years ago
SAP
• In the order of 10,000
‘concepts’
• Days to find the right column
Medical information system
(Lars)
• Treating transplant patients
• Approx. 10,000 concepts
Only my
very limited
experiences
Big consulting
business
Steffen Staab Programming with Semantic Broad Data 6
The World of Big Data – Variety – Today!
Wikidata
• 1,148,230 concepts
• 2515 relations
UMLS
• 1 Mio concepts
Bioinformatics
• 1000s public databases
• 35 in Bio2rdf
(11 bio triples)
eGov datasets
• 200,000 by Fraunh. Fokus
• 20,000 by ODI
Knowledge Graphs
• Ask Google, Microsoft,
Samsung, HP, ...
Sensor types
• 330 broad types in Wikipedia
• Tens of thousands
How to write
valid, robust
programs?
How to find data?
Steffen Staab Programming with Semantic Broad Data 7
How to write a valid, robust program?
SELECT ?x
WHERE
{
?x a CONCEPT15
}
SELECT ?x
WHERE
{
?x a CONCEPT151735
}
https://flic.kr/p/8zuDTm
18 concepts
1,166,040 concepts
1,148,230 concepts
Sept, ´16
March, ´16
Steffen Staab Programming with Semantic Broad Data 8
How to approach big data
In fhe following I am guessing
what Axel Polleres might have told you
about Enterprise Linked Data
Steffen Staab Programming with Semantic Broad Data 9
Traditional Information Architecture
Business
Logics
Structured Data
Unstructured
Data
Presentation and
Interaction
Characteristics:
• Processes are
known
• Data structures
are known
• Meaning of data
primarily in
schema and code
Steffen Staab Programming with Semantic Broad Data 10
Big Data in Today‘s Information Architecture
Characteristics:
• Little structure
• Semi-structured
data
• Meaning of data of
primary
importance!
Steffen Staab Programming with Semantic Broad Data 11
Variety Issue 1: Data Models
Data Models:
• Relational
• Tree (XML,...)
• Document oriented
• Stream
• Array
• Graph-DB
RDF
Graph data model as
common denominator
Steffen Staab Programming with Semantic Broad Data 12
Dealing with Issue 1: RDF as Data Model
RDF
Graph data model as
common denominator
knows
Bowie
Saran-
don
8-1-1947
bornOn
Steffen Staab Programming with Semantic Broad Data 13
Variety Issue 2: Conceptual Models
Conceptual Models:
• ER
• UML
• ...
RDFS
Ontology as common
denominator
Steffen Staab Programming with Semantic Broad Data 14
Variety Issue 2: RDFS as common conceptual meta model
RDFS
for explicit conceptual
description
knows
Bowie
Saran-
don
8-1-1947
bornOn
MusicArtist Actor
typetype
Steffen Staab Programming with Semantic Broad Data 15
Variety Issue 3: System Boundaries
IRIs
for globally unique
referencing
f:knows
m:Bowie
d:Saran
-don
8-1-1947
m:bornOn
m:Music
Artist
d:Actor
rdf:typerdf:type
m = http://musicbrainz.org
d = http://dbpedia.org
f = http://xmlns.com/foaf/0.1/
rdf = https://www.w3.org/2001/sw/
Steffen Staab Programming with Semantic Broad Data 16
A Practical Perspective on
Broad Data with LITEQ
Steffen Staab Programming with Semantic Broad Data 17
Drosophila: Linked Open Data Cloud
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Dozens of domains
Hundreds of data sources
Thousands of concepts
Millions of entities
Billions of triples
Semantic
Broad
Data
Steffen Staab Programming with Semantic Broad Data 18
Programming with Linked Data
Steffen Staab Programming with Semantic Broad Data 19
c1
Programming with Linked Data
Tasks of the Programmer
1 Schema exploration
2 Programming
code types
3 Programming queries
4 Programming procedures
for
• creating,
• manipulating,
• persisting
objects
Steffen Staab Programming with Semantic Broad Data 20
Node Path Query Language Using Autocompletion
Exploration of classes
Steffen Staab Programming with Semantic Broad Data 21
Node Path Query Language Using Autocompletion
Exploration of classes
Exploration of relations
Steffen Staab Programming with Semantic Broad Data 22
Node Path Query Language: Query Formulation
Exploration of classes
Exploration of relations
Querying for instances
Type
set of mo:MusicArtist
No definition or
declaration needed
Steffen Staab Programming with Semantic Broad Data 23
Node Path Query Language for Code Development
Exploration of classes
Exploration of relations
Querying for instances
Developing code with queries
All translated into SPARQL queries at
• Development time
• Type inference at compile time
(but also as part of IDE)
• Querying again at run time
One language to bind them all
Steffen Staab Programming with Semantic Broad Data 24
Node Path Query Language for Code Development
Exploration of classes
Exploration of relations
Querying for instances
Developing code with queries
Developing code with new classes
All translated into SPARQL queries at
• Development time
• Run time update
• Persistence!
Steffen Staab Programming with Semantic Broad Data 25
Formal NPQL Syntax
Data browsing
Restricting Class Expressions
Evaluating Class Expressions
Navigating from Data to Classes
Navigating from Data to Property Types
URI set
Intensional
Queries
Extensional
Queries
Navigational
Queries
Steffen Staab Programming with Semantic Broad Data 27
NPQL Algebra (Example)
Reversibility
can be used to simplify path expressions.
Steffen Staab Programming with Semantic Broad Data 28
Summary on LITEQ
Language Integrated Types, Extensions, and Queries
NPQL (Node Path Query Language)
• Navigational Queries
• Intensional Queries
• Extensional Queries
• Compilation to SPARQL
LITEQ
• Implementation of NPQL as F# Type Provider in Visual Studio
• Autocompletion using NPQL queries
• Automatic typing
of extensional query results
by intensional queries
Steffen Staab Programming with Semantic Broad Data 29
„That seems to work very well in practice,
but how does it work in theory?“
17 let allArtists =
Store.NPQL().``mo:MusicArtist``.Extension
What is implied by such a line...
...for the programme?
...for the compiler?
seems to
Steffen Staab Programming with Semantic Broad Data 30
A Foundational Perspective on
Semantic Broad Data Using DL
Steffen Staab Programming with Semantic Broad Data 31
What we want to have: Static Type Checking
But:
• In LITEQ: Queries must receive types
• Number of types in our system very/infinitely large
• Existing type systems expect complete knowledge
Programming with Data from a Knowledge Base
Issue in our prototype
Steffen Staab Programming with Semantic Broad Data 32
Related Work
Generic Types
• Everything is a node
or an edge
• No type checking!
 Only 2nd place in
Halo competition
Mapping approaches
• Hibernate
• LITEQ
• ActiveRDF
• Summer / Winter
• ...
Preferred in SemWeb now Been there, done that
Steffen Staab Programming with Semantic Broad Data 33
Example – and Issues with Mapping
Mapping DL types to PL types problematic because
1. Mix of nominal (MusicArtist) and structural typing (recorded.Song)
2. Schema-less information (influencedBy)
3. Inference (hendrix:MusicArtist)
4. Sheer size of terminology
How to type a
query?
Steffen Staab Programming with Semantic Broad Data 34
Example
Code
To be rejected
is not subtype of
How to type a
query?
Steffen Staab Programming with Semantic Broad Data 35
Example
Code
To be accepted
is a
How to type a
query?
Steffen Staab Programming with Semantic Broad Data 36
What we want to have: Static Type Checking
Challenge:
• A programming language that accepts
concept expressions as types and
can deal with inferences
Programming with Data from a Knowledge Base
DL
Steffen Staab Programming with Semantic Broad Data 37
Given 
• Atomic Types: A={...Ai...}
• Plus Function types: T={...Ai..., ...TiTj...}
Add elements
• Concept expressions ( Intensional NPQL queries )
• Instances ( Extensional NPQL queries)
Add knowledge
• Typing and subtyping derived from knowledge base
Core Ideas of DL
Steffen Staab Programming with Semantic Broad Data 38
Concept Forming
Expressions
Syntax Semantics
Top T I
Bottom  I
Concept Name A AI
Intersection A  B AI  BI
Negation A I  AI
Existential Restriction R.C { a I | (a,b) RI and b  CI}
Axioms Syntax Semantics
T-Box Subclass C  D AI  BI
A-Box Concept assertion a:C aI CI
A-Box Role assertion (a,b) : R (aI,bI)  RI
Description Logics Fragment
Steffen Staab Programming with Semantic Broad Data 39
Universal model of computation
• Abstraction
• Application
Example:
• f.x.f (f x)
Evaluation rules
 Calculus
Steffen Staab Programming with Semantic Broad Data 40
Syntax for core DL
Steffen Staab Programming with Semantic Broad Data 41
Core DL: Evaluation and Typing
Nominal DL-Type
Steffen Staab Programming with Semantic Broad Data 42
Subtyping
 many types
Add KB knowledge
only when needed for
checking application,
not proactively
Steffen Staab Programming with Semantic Broad Data 43
• Queries return sets
• Concept set type needed
• Set operators needed
• Map, Fold, Element
• Queries may return infinite sets
• No theoretical problem,
but lack of well-defined stopping conditions in KBs
• Type dispatch based on inferencing
Further issues and opportunities in DL
Steffen Staab Programming with Semantic Broad Data 44
DL Interpreter in F# and using HermiT
Steffen Staab Programming with Semantic Broad Data 45
Theorem: A well-typed closed term does not get stuck
during evaluation (with common exceptions).
Result for DL
Typing is a safety net,
but does not solve the halting problem
(empty list)
Steffen Staab Programming with Semantic Broad Data 46
Conclusion
Steffen Staab Programming with Semantic Broad Data 47
Broad data
• has grown from 104 to 106 concepts (plus data)
• continues to grow
– more integration of distributed databases
– more sensors of different types
– More crowdwork
• has not been recognized as a problem of its own, yet
• will lead to
– brittleness
– high maintenance efforts
– loss of opportunities
Present of Broad Data
Steffen Staab Programming with Semantic Broad Data 48
New Methods for Broad data
• Explore
– Understand
• Find
• Relate (see e.g. Linda‘s talk today)
• Program
• Maintain
Future of Broad Data
Steffen Staab Programming with Semantic Broad Data 49Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Thank you for your attention!
Thanks to my collaborators for this work:
Stefan Schegelmann, Martin Leinberger, Matthias Thimm (WeST, Koblenz)
Evelyne Viegas (Microsoft Research, Redmond)
Ralf Lämmel (SOFTLANG, Koblenz)

Más contenido relacionado

La actualidad más candente

Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
Tony Fast
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 

La actualidad más candente (20)

Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGData Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and Python
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 

Destacado

CPARK Illustration
CPARK IllustrationCPARK Illustration
CPARK Illustration
plantmonster
 
Open Source APM mit inspectIT - Open Source Workshop Deutsche Bahn
Open Source APM mit inspectIT - Open Source Workshop Deutsche BahnOpen Source APM mit inspectIT - Open Source Workshop Deutsche Bahn
Open Source APM mit inspectIT - Open Source Workshop Deutsche Bahn
Stefan Siegl
 

Destacado (13)

Arbor Vs Slideshare Presentation
Arbor Vs Slideshare PresentationArbor Vs Slideshare Presentation
Arbor Vs Slideshare Presentation
 
MSS VM presentation
MSS VM presentationMSS VM presentation
MSS VM presentation
 
Tecnologia paz mancinelli mateo perrone 2
Tecnologia paz mancinelli mateo perrone 2Tecnologia paz mancinelli mateo perrone 2
Tecnologia paz mancinelli mateo perrone 2
 
Deb's Back Porch Tales On Power Point
Deb's Back Porch Tales On Power PointDeb's Back Porch Tales On Power Point
Deb's Back Porch Tales On Power Point
 
Eco 316 week 5 final paper
Eco 316 week 5 final paperEco 316 week 5 final paper
Eco 316 week 5 final paper
 
CPARK Illustration
CPARK IllustrationCPARK Illustration
CPARK Illustration
 
inspectIT @ SoftwareQualityDays 2015 - Tool Challenge
inspectIT @ SoftwareQualityDays 2015 - Tool ChallengeinspectIT @ SoftwareQualityDays 2015 - Tool Challenge
inspectIT @ SoftwareQualityDays 2015 - Tool Challenge
 
Teknoloji Kullanimi - 4
Teknoloji Kullanimi - 4Teknoloji Kullanimi - 4
Teknoloji Kullanimi - 4
 
Open Source APM mit inspectIT - Open Source Workshop Deutsche Bahn
Open Source APM mit inspectIT - Open Source Workshop Deutsche BahnOpen Source APM mit inspectIT - Open Source Workshop Deutsche Bahn
Open Source APM mit inspectIT - Open Source Workshop Deutsche Bahn
 
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 4
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 4Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 4
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 4
 
Marketing de servicios
Marketing de serviciosMarketing de servicios
Marketing de servicios
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStack
 
Bizzy Minds-PILOT
Bizzy Minds-PILOTBizzy Minds-PILOT
Bizzy Minds-PILOT
 

Similar a Programming with Semantic Broad Data

An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
Databricks
 

Similar a Programming with Semantic Broad Data (20)

Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuity
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
IT_Tools_in_Research.ppt
IT_Tools_in_Research.pptIT_Tools_in_Research.ppt
IT_Tools_in_Research.ppt
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 

Más de Steffen Staab

Más de Steffen Staab (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
 
Web Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableWeb Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, Sustainable
 
Eyeing the Web
Eyeing the WebEyeing the Web
Eyeing the Web
 
Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Ontologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagOntologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag Terminologietag
 
Opinion Formation and Spreading
Opinion Formation and SpreadingOpinion Formation and Spreading
Opinion Formation and Spreading
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
10 Jahre Web Science
10 Jahre Web Science10 Jahre Web Science
10 Jahre Web Science
 
Wwsss intro2016-final
Wwsss intro2016-finalWwsss intro2016-final
Wwsss intro2016-final
 
10 Years Web Science
10 Years Web Science10 Years Web Science
10 Years Web Science
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and Practices
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015
 
ISWC2015 Opening Session
ISWC2015 Opening SessionISWC2015 Opening Session
ISWC2015 Opening Session
 
Bias in the Social Web
Bias in the Social WebBias in the Social Web
Bias in the Social Web
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the Unknown
 

Último

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 

Programming with Semantic Broad Data

  • 1. Steffen Staab Programming with Semantic Broad Data 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Web and Internet Science Group · ECS · University of Southampton, UK & Programming with Semantic Broad Data Steffen Staab @ststaab west.uni-koblenz.de
  • 2. Steffen Staab Programming with Semantic Broad Data 2 The World of Big Data – Volume & Velocity Genome data • Up to 200 GB/person Video data • Upload 300 hrs/min Sensor data • 5000 sensors/jet engine • 1 Tera bit/s 360 TB/disc https://flic.kr/p/8zuDTm https://flic.kr/p/59jc2h
  • 3. Steffen Staab Programming with Semantic Broad Data 3 The World of Big Data – Volume & Velocity Genome data • Up to 200 GB/person Video data • Upload 300 hrs/min Sensor data • 5000 sensors/jet engine • 1 Tera bit/s https://flic.kr/p/8zuDTm https://flic.kr/p/59jc2h 18 concepts Noise amplitudes
  • 4. Steffen Staab Programming with Semantic Broad Data 4 The World of Big Data – Variety Data models • Graph data • Relational • XML • RDF • CSV • JPEG • MPEG-1, 2, 4 • Dicom • PDF • Excel • ... Conceptual models aka ER schemata aka Logical schemata aka XML schemata aka RDFS / OWL ontologies Foaf, Dublin Core, Marc81, Unifact,..... Dozens - Hundreds "¥"
  • 5. Steffen Staab Programming with Semantic Broad Data 5 The World of Big Data – Variety – 15 years ago SAP • In the order of 10,000 ‘concepts’ • Days to find the right column Medical information system (Lars) • Treating transplant patients • Approx. 10,000 concepts Only my very limited experiences Big consulting business
  • 6. Steffen Staab Programming with Semantic Broad Data 6 The World of Big Data – Variety – Today! Wikidata • 1,148,230 concepts • 2515 relations UMLS • 1 Mio concepts Bioinformatics • 1000s public databases • 35 in Bio2rdf (11 bio triples) eGov datasets • 200,000 by Fraunh. Fokus • 20,000 by ODI Knowledge Graphs • Ask Google, Microsoft, Samsung, HP, ... Sensor types • 330 broad types in Wikipedia • Tens of thousands How to write valid, robust programs? How to find data?
  • 7. Steffen Staab Programming with Semantic Broad Data 7 How to write a valid, robust program? SELECT ?x WHERE { ?x a CONCEPT15 } SELECT ?x WHERE { ?x a CONCEPT151735 } https://flic.kr/p/8zuDTm 18 concepts 1,166,040 concepts 1,148,230 concepts Sept, ´16 March, ´16
  • 8. Steffen Staab Programming with Semantic Broad Data 8 How to approach big data In fhe following I am guessing what Axel Polleres might have told you about Enterprise Linked Data
  • 9. Steffen Staab Programming with Semantic Broad Data 9 Traditional Information Architecture Business Logics Structured Data Unstructured Data Presentation and Interaction Characteristics: • Processes are known • Data structures are known • Meaning of data primarily in schema and code
  • 10. Steffen Staab Programming with Semantic Broad Data 10 Big Data in Today‘s Information Architecture Characteristics: • Little structure • Semi-structured data • Meaning of data of primary importance!
  • 11. Steffen Staab Programming with Semantic Broad Data 11 Variety Issue 1: Data Models Data Models: • Relational • Tree (XML,...) • Document oriented • Stream • Array • Graph-DB RDF Graph data model as common denominator
  • 12. Steffen Staab Programming with Semantic Broad Data 12 Dealing with Issue 1: RDF as Data Model RDF Graph data model as common denominator knows Bowie Saran- don 8-1-1947 bornOn
  • 13. Steffen Staab Programming with Semantic Broad Data 13 Variety Issue 2: Conceptual Models Conceptual Models: • ER • UML • ... RDFS Ontology as common denominator
  • 14. Steffen Staab Programming with Semantic Broad Data 14 Variety Issue 2: RDFS as common conceptual meta model RDFS for explicit conceptual description knows Bowie Saran- don 8-1-1947 bornOn MusicArtist Actor typetype
  • 15. Steffen Staab Programming with Semantic Broad Data 15 Variety Issue 3: System Boundaries IRIs for globally unique referencing f:knows m:Bowie d:Saran -don 8-1-1947 m:bornOn m:Music Artist d:Actor rdf:typerdf:type m = http://musicbrainz.org d = http://dbpedia.org f = http://xmlns.com/foaf/0.1/ rdf = https://www.w3.org/2001/sw/
  • 16. Steffen Staab Programming with Semantic Broad Data 16 A Practical Perspective on Broad Data with LITEQ
  • 17. Steffen Staab Programming with Semantic Broad Data 17 Drosophila: Linked Open Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Dozens of domains Hundreds of data sources Thousands of concepts Millions of entities Billions of triples Semantic Broad Data
  • 18. Steffen Staab Programming with Semantic Broad Data 18 Programming with Linked Data
  • 19. Steffen Staab Programming with Semantic Broad Data 19 c1 Programming with Linked Data Tasks of the Programmer 1 Schema exploration 2 Programming code types 3 Programming queries 4 Programming procedures for • creating, • manipulating, • persisting objects
  • 20. Steffen Staab Programming with Semantic Broad Data 20 Node Path Query Language Using Autocompletion Exploration of classes
  • 21. Steffen Staab Programming with Semantic Broad Data 21 Node Path Query Language Using Autocompletion Exploration of classes Exploration of relations
  • 22. Steffen Staab Programming with Semantic Broad Data 22 Node Path Query Language: Query Formulation Exploration of classes Exploration of relations Querying for instances Type set of mo:MusicArtist No definition or declaration needed
  • 23. Steffen Staab Programming with Semantic Broad Data 23 Node Path Query Language for Code Development Exploration of classes Exploration of relations Querying for instances Developing code with queries All translated into SPARQL queries at • Development time • Type inference at compile time (but also as part of IDE) • Querying again at run time One language to bind them all
  • 24. Steffen Staab Programming with Semantic Broad Data 24 Node Path Query Language for Code Development Exploration of classes Exploration of relations Querying for instances Developing code with queries Developing code with new classes All translated into SPARQL queries at • Development time • Run time update • Persistence!
  • 25. Steffen Staab Programming with Semantic Broad Data 25 Formal NPQL Syntax Data browsing Restricting Class Expressions Evaluating Class Expressions Navigating from Data to Classes Navigating from Data to Property Types URI set Intensional Queries Extensional Queries Navigational Queries
  • 26. Steffen Staab Programming with Semantic Broad Data 27 NPQL Algebra (Example) Reversibility can be used to simplify path expressions.
  • 27. Steffen Staab Programming with Semantic Broad Data 28 Summary on LITEQ Language Integrated Types, Extensions, and Queries NPQL (Node Path Query Language) • Navigational Queries • Intensional Queries • Extensional Queries • Compilation to SPARQL LITEQ • Implementation of NPQL as F# Type Provider in Visual Studio • Autocompletion using NPQL queries • Automatic typing of extensional query results by intensional queries
  • 28. Steffen Staab Programming with Semantic Broad Data 29 „That seems to work very well in practice, but how does it work in theory?“ 17 let allArtists = Store.NPQL().``mo:MusicArtist``.Extension What is implied by such a line... ...for the programme? ...for the compiler? seems to
  • 29. Steffen Staab Programming with Semantic Broad Data 30 A Foundational Perspective on Semantic Broad Data Using DL
  • 30. Steffen Staab Programming with Semantic Broad Data 31 What we want to have: Static Type Checking But: • In LITEQ: Queries must receive types • Number of types in our system very/infinitely large • Existing type systems expect complete knowledge Programming with Data from a Knowledge Base Issue in our prototype
  • 31. Steffen Staab Programming with Semantic Broad Data 32 Related Work Generic Types • Everything is a node or an edge • No type checking!  Only 2nd place in Halo competition Mapping approaches • Hibernate • LITEQ • ActiveRDF • Summer / Winter • ... Preferred in SemWeb now Been there, done that
  • 32. Steffen Staab Programming with Semantic Broad Data 33 Example – and Issues with Mapping Mapping DL types to PL types problematic because 1. Mix of nominal (MusicArtist) and structural typing (recorded.Song) 2. Schema-less information (influencedBy) 3. Inference (hendrix:MusicArtist) 4. Sheer size of terminology How to type a query?
  • 33. Steffen Staab Programming with Semantic Broad Data 34 Example Code To be rejected is not subtype of How to type a query?
  • 34. Steffen Staab Programming with Semantic Broad Data 35 Example Code To be accepted is a How to type a query?
  • 35. Steffen Staab Programming with Semantic Broad Data 36 What we want to have: Static Type Checking Challenge: • A programming language that accepts concept expressions as types and can deal with inferences Programming with Data from a Knowledge Base DL
  • 36. Steffen Staab Programming with Semantic Broad Data 37 Given  • Atomic Types: A={...Ai...} • Plus Function types: T={...Ai..., ...TiTj...} Add elements • Concept expressions ( Intensional NPQL queries ) • Instances ( Extensional NPQL queries) Add knowledge • Typing and subtyping derived from knowledge base Core Ideas of DL
  • 37. Steffen Staab Programming with Semantic Broad Data 38 Concept Forming Expressions Syntax Semantics Top T I Bottom  I Concept Name A AI Intersection A  B AI  BI Negation A I AI Existential Restriction R.C { a I | (a,b) RI and b  CI} Axioms Syntax Semantics T-Box Subclass C  D AI  BI A-Box Concept assertion a:C aI CI A-Box Role assertion (a,b) : R (aI,bI)  RI Description Logics Fragment
  • 38. Steffen Staab Programming with Semantic Broad Data 39 Universal model of computation • Abstraction • Application Example: • f.x.f (f x) Evaluation rules  Calculus
  • 39. Steffen Staab Programming with Semantic Broad Data 40 Syntax for core DL
  • 40. Steffen Staab Programming with Semantic Broad Data 41 Core DL: Evaluation and Typing Nominal DL-Type
  • 41. Steffen Staab Programming with Semantic Broad Data 42 Subtyping  many types Add KB knowledge only when needed for checking application, not proactively
  • 42. Steffen Staab Programming with Semantic Broad Data 43 • Queries return sets • Concept set type needed • Set operators needed • Map, Fold, Element • Queries may return infinite sets • No theoretical problem, but lack of well-defined stopping conditions in KBs • Type dispatch based on inferencing Further issues and opportunities in DL
  • 43. Steffen Staab Programming with Semantic Broad Data 44 DL Interpreter in F# and using HermiT
  • 44. Steffen Staab Programming with Semantic Broad Data 45 Theorem: A well-typed closed term does not get stuck during evaluation (with common exceptions). Result for DL Typing is a safety net, but does not solve the halting problem (empty list)
  • 45. Steffen Staab Programming with Semantic Broad Data 46 Conclusion
  • 46. Steffen Staab Programming with Semantic Broad Data 47 Broad data • has grown from 104 to 106 concepts (plus data) • continues to grow – more integration of distributed databases – more sensors of different types – More crowdwork • has not been recognized as a problem of its own, yet • will lead to – brittleness – high maintenance efforts – loss of opportunities Present of Broad Data
  • 47. Steffen Staab Programming with Semantic Broad Data 48 New Methods for Broad data • Explore – Understand • Find • Relate (see e.g. Linda‘s talk today) • Program • Maintain Future of Broad Data
  • 48. Steffen Staab Programming with Semantic Broad Data 49Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Web and Internet Science Group · ECS · University of Southampton, UK & Thank you for your attention! Thanks to my collaborators for this work: Stefan Schegelmann, Martin Leinberger, Matthias Thimm (WeST, Koblenz) Evelyne Viegas (Microsoft Research, Redmond) Ralf Lämmel (SOFTLANG, Koblenz)

Notas del editor

  1. Programming with Semantic Broad Data* The challenges of Big Data are frequently explained by dealing with Volume, Velocity, Variety and Veracity. The large variety of data in organizations results from accessing different information systems with heterogeneous schemata or ontologies. In this talk I will present the research efforts that target the management of such broad data. They include: (i) an integrated development environment for programming with broad data, (ii) a query language that allows for typing of query results, (iii) a typed lambda-calculus based on description logics, and (iv) efficient access to data repositories via schema indices. Programming with Semantic Broad Data Steffen Staab   Abstract: Challenges of Big Data are frequently explained by the technical challenges arising from dealing with Volume, Velocity, Variety, and Veracity. The large variety of data in organisations results from having access to a broad set of different information systems with heterogeneous schemata or ontologies. In this talk I will present research efforts that target the management of such broad data. They include: (i) an integrated development environment for programming with broad data, (ii) a query language that allows for typing of query results, (iii) a typed lambda-calculus based on description logics, and (iv) efficient access to data repositories via schema indices.   CV: Steffen Staab is professor for Databases and Information Systems at Universität Koblenz-Landau and holds a chair in Computer and Web Science at University of Southampton. He is interested in managing text and data and specifically in methods that target the management of explicit data semantics as well as the discovery of implicit text and data semantics.
  2. 360 TB / disc Scientists at the University of Southampton have made a major step forward in the development of digital data storage that is capable of surviving for billions of years. Using nanostructured glass, scientists from the University’s Optoelectronics Research Centre (ORC) have developed the recording and retrieval processes of five dimensional (5D) digital data by femtosecond laser writing. The storage allows unprecedented properties including 360 TB/disc data capacity, thermal stability up to 1,000°C and virtually unlimited lifetime at room temperature (13.8 billion years at 190°C ) opening a new era of eternal data archiving. These five thousand sensors create an astounding amount of data, 10 GB/s per engine. That is 1.02 Tbps, or 2.04 Tbps for a typical twin engine such as Airbus 320NEO or Boeing 737MAX. For comparison, a Formula 1 car produces around 1.2 GB/s (12.28Gbps), and current batch of P&W plane engines collects data in low Megabits, not Terabits per second.
  3. EMBL, Cambridge Could produce trillions of triples for genome information – but having triples is not sooo valuable for this task Rolls Royce, X-media project – not soo interesting data for the knowledge engineer
  4. Verteilte Daten und Zuständigkeiten Unternehmens-übergreifenden Datendienste Ad hoc-Daten (z.B. neue Sensoren) Semantic Web data is (i) provided by different people in an ad-hoc manner, (ii) distributed, (iii) semi-structured, (iv) (more or less) typed, (v) supposed to be used serendipitously.
  5. Impedance mismatch
  6. Verteilte Daten und Zuständigkeiten Unternehmens-übergreifenden Datendienste Ad hoc-Daten (z.B. neue Sensoren) Semantic Web data is (i) provided by different people in an ad-hoc manner, (ii) distributed, (iii) semi-structured, (iv) (more or less) typed, (v) supposed to be used serendipitously.
  7. Verteilte Daten und Zuständigkeiten Unternehmens-übergreifenden Datendienste Ad hoc-Daten (z.B. neue Sensoren) Semantic Web data is (i) provided by different people in an ad-hoc manner, (ii) distributed, (iii) semi-structured, (iv) (more or less) typed, (v) supposed to be used serendipitously.
  8. Verteilte Daten und Zuständigkeiten Unternehmens-übergreifenden Datendienste Ad hoc-Daten (z.B. neue Sensoren) Semantic Web data is (i) provided by different people in an ad-hoc manner, (ii) distributed, (iii) semi-structured, (iv) (more or less) typed, (v) supposed to be used serendipitously.
  9. Verteilte Daten und Zuständigkeiten Unternehmens-übergreifenden Datendienste Ad hoc-Daten (z.B. neue Sensoren) Semantic Web data is (i) provided by different people in an ad-hoc manner, (ii) distributed, (iii) semi-structured, (iv) (more or less) typed, (v) supposed to be used serendipitously.
  10. Pun with LINQ is intended LINQ – Language integrated queries LITEQ – Language integrated types, extensions, and queries
  11. Diese folie mit vorne noch synchronisieren
  12. „DATA SOURCE“ SIND ZWEI WORTE, LAYOUT ANPASSEN; SCHWARZ VOR DUNKELGRAU SCHLECHT LESBAR ES FEHLT DER DEVELOPER IM BILD
  13. Wenn wir in eine Datenquelle hineinzoomen, dann finden wir Triple: + zur Beschreibung von Klassen (z.B. creature) + zur Beschreibung von Schemainformationen über 2-stellige-Relationen (z.B. hasOwner) + zur Beschreibung der Daten selbst, z.B. Bob + und es gibt noch eindeutige Identifier in Form von URIs, die stellen wir uns heute einfach mal als Java-Package-Namen vor Was muss der Programmierer dann tun?
  14. Automatic typing is not possible for general queries
  15. Static type checking: better informed interfaces, avoiding run-time errors
  16. (1) Conceptualizations rely on a mixture of nominal (MusicArtist) and structural typing (9recorded.Song). (2) It is also not uncommon to have a very general or no conceptualization at all, as exemplified by the influencedBy role that expresses that hendrix has been influenced by the beatles. (3) Additional, implicit statements may be derived by logical reasoning, e.g., in our running example hendrix:MusicArtist can be inferred. Another challenge is not illustrated: (4) In real data sources, the sheer size of potential types may become problem. It is practically infeasible to explicitly convert all 1,148,230 different concepts of Wikidata into types of a programming language.
  17. (1) Conceptualizations rely on a mixture of nominal (MusicArtist) and structural typing (9recorded.Song). (2) It is also not uncommon to have a very general or no conceptualization at all, as exemplified by the influencedBy role that expresses that hendrix has been influenced by the beatles. (3) Additional, implicit statements may be derived by logical reasoning, e.g., in our running example hendrix:MusicArtist can be inferred. Another challenge is not illustrated: (4) In real data sources, the sheer size of potential types may become problem. It is practically infeasible to explicitly convert all 1,148,230 different concepts of Wikidata into types of a programming language.
  18. (1) Conceptualizations rely on a mixture of nominal (MusicArtist) and structural typing (9recorded.Song). (2) It is also not uncommon to have a very general or no conceptualization at all, as exemplified by the influencedBy role that expresses that hendrix has been influenced by the beatles. (3) Additional, implicit statements may be derived by logical reasoning, e.g., in our running example hendrix:MusicArtist can be inferred. Another challenge is not illustrated: (4) In real data sources, the sheer size of potential types may become problem. It is practically infeasible to explicitly convert all 1,148,230 different concepts of Wikidata into types of a programming language.
  19. Static type checking: better informed interfaces, avoiding run-time errors
  20. ALCO, e.g. part of OWL2DL
  21. A Nominal DL-Type a may be instance of infinitely many types. This is a syntactic trick to really pick a most specific type
  22. Open issues: Anonymous entities Metaprogramming: Queries returning concepts Plans: tree-shaped Conjunctive queries Generics Changes to the knowledge base