SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Open Data de luxe: Querying
public SPARQL endpoints from the
command line, R and Pandas
Bolzano - 13 NOV 2020 We make data actually usable
Making the most of Open Data Hub, Wikidata, DBpedia
and other sources of high quality data
Evolutions of Open Data: 5 star Open Data
★ available on the web (whatever format) but with an open licence
★★ plus: available as machine-readable structured data (e.g. excel instead of image
scan of a table)
★★★ plus: non-proprietary format (e.g. CSV instead of excel)
★★★★ plus: Use open standards from W3C (RDF and SPARQL) to identify things
★★★★★plus: Link your data to other people’s data to provide context
https://5stardata.info/
Evolutions of Open Data: FAIR
Findable, Accessible, Interoperable and Reusable (FAIR)
FAIR data is not always open data (personal data, competitive data etc.)
❖ It facilitates data interchange on the web
❖ It facilitates data integration across sources even when schemas are
different
❖ It supports evolution of schemas over time with minimal disruption
to data consumers
https://www.go-fair.org
Technology of choice: 1 - RDF
RDF is “a standard model for data
interchange on the Web”
Large graphs are build on triples
@prefix ab:
<http://learningsparql.com/ns/addressbook#> .
@prefix rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
ab:i0432
ab:firstName "Richard" ;
ab:lastName "Mutt" ;
ab:spouse ab:i9771 .
ab:i8301
ab:firstName "Craig" ;
ab:lastName "Ellis" ;
ab:patient ab:i9771 .
ab:i9771
ab:firstName "Cindy" ;
ab:lastName "Marshall" .
ab:spouse
rdf:type owl:SymmetricProperty ;
rdfs:comment "Identifies someone's spouse" .
ab:patient
rdf:type rdf:Property ;
rdfs:comment "Identifies a doctor's patient" .
subject predicate object
Technology of choice: 2 - SPARQL
SPARQL, the language to to select, update, create and delete
triples
PREFIX schema: <http://schema.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT * WHERE {
?t a schema:PerformingArtsTheater ;
geo:asWKT ?pos ;
schema:name ?posLabel .
}
Technology of choice: 2 - SPARQL
SPARQL is similar to SQL, but is web age:
★ HTTP/S as transport protocol
★ No drivers required
★ Standardized by the W3C
Your personal SPARQL database: Tracker
Tracker is the file system indexer used by the Gnome
desktop, e.g. for Full Text Search
$ tracker sparql -q "SELECT DISTINCT ?performerName WHERE {?s
<http://www.tracker-project.org/temp/nmm#performer> ?performerName
. }"
Results:
urn:artist:Yasmine%20Hamdan
urn:artist:Otfried%20Preu%C3%9Fler
urn:artist:Queens%20Of%20The%20Stone%20Age
urn:artist:Guns%20N'Roses
...
Big SPARQL endpoints: Wikidata
Wikidata handles the
fact data for wikipedia
articles
Data from Wikidata
Link to Wikidata entry
Big SPARQL endpoints: Wikidata
Big SPARQL endpoints: DBpedia
DBpedia extracts the data from Wikipedia
and makes this data available and
downloadable
Big SPARQL endpoints: Typical queries
Big SPARQL endpoints: datacommons.org
Operated by Google. Integrates
many data sources:
★ United States Census
★ World Bank
★ US Bureau of Labor Statistics
★ Wikipedia
★ National Oceanic and
Atmospheric Administration
★ Federal Bureau of
Investigation
★ ...
0 KM endpoints: The Open Data Hub
Operated by NOI Techpark (https://sparql.opendatahub.bz.it/)
How can I use these end points for my
analyses?
$ curl -X POST https://query.wikidata.org/sparql -H
"Accept: text/csv" --data-urlencode query@countries.rq
Command line: cURL
# countries.rq
SELECT DISTINCT ?countryLabel ?population ?area
WHERE
{
?country wdt:P31 wd:Q6256 .
?country wdt:P1082 ?population .
?country wdt:P2046 ?area .
MINUS {?country wdt:P31 wd:Q3024240 .}
SERVICE wikibase:label { bd:serviceParam wikibase:language
"en,[AUTO_LANGUAGE]". }
}
ORDER BY DESC(?population)
$ ${JENA_DIR}/bin/rsparql --service
'https://query.wikidata.org/sparql' --query countries.rq
--results=CSV > countries.csv
Command line: rsparql from Apache
# countries.rq
SELECT DISTINCT ?countryLabel ?population ?area
WHERE
{
?country wdt:P31 wd:Q6256 .
?country wdt:P1082 ?population .
?country wdt:P2046 ?area .
MINUS {?country wdt:P31 wd:Q3024240 .}
SERVICE wikibase:label { bd:serviceParam wikibase:language
"en,[AUTO_LANGUAGE]". }
}
ORDER BY DESC(?population)
Directly from R
library(WikidataQueryServiceR)
r <- query_wikidata('
SELECT DISTINCT ?countryLabel ?population ?area
WHERE
{
?country wdt:P31 wd:Q6256 .
?country wdt:P1082 ?population .
?country wdt:P2046 ?area .
MINUS {?country wdt:P31 wd:Q3024240 .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
}
ORDER BY DESC(?population)
')
head(r)
# A tibble: 6 x 3
countryLabel population area
<chr> <dbl> <dbl>
1 People's Republic of China 1409517397 9596961
2 India 1326093247 3287263
3 United States of America 328239523 9826675
...
Python with the requests module
import requests
url = "https://sparql.opendatahub.bz.it/sparql"
q = """
PREFIX schema: <http://schema.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT * WHERE {
?t a schema:PerformingArtsTheater ; geo:asWKT ?pos ; schema:name ?posLabel .
}
"""
r = requests.get(url, params = {'query': q}, headers={'Content-Type':
'application/sparql-results+json'})
print(r.json())
It works, but the returned results are not directly usable as a
table.
Python with sparql_client
import sparql
endpoint = "https://sparql.opendatahub.bz.it/sparql"
q = """
PREFIX schema: <http://schema.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT * WHERE {
?t a schema:PerformingArtsTheater ; geo:asWKT ?pos ; schema:name ?posLabel .
}
"""
result = sparql.query(endpoint, q)
for row in result:
print (row)
(<IRI <http://noi.example.org/data/poi/9621F83525089644A0D47464D27D634E>>, <Literal "POINT
(11.3534199999999998
46.4990740000000002)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>>, <Literal
"Kleinkunsttheater Carambolage">)
...
Good, but needs some rework for Pandas
Python with sparql-dataframe
import sparql_dataframe
endpoint = "https://sparql.opendatahub.bz.it/sparql"
q = """
PREFIX schema: <http://schema.org/>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT * WHERE {
?t a schema:PerformingArtsTheater ; geo:asWKT ?pos ; schema:name ?posLabel .
}
"""
df = sparql_dataframe.get(endpoint, q)
Most comfortable solution for Pandas
What makes RDF / SPARQL great for data
exchange?
★ Data really be queried, not only
downloaded
★ Well structured data with rich data models,
often standardized and good metadata
★ Data is easy to integrate
★ Technology is easy to integrate
Thank you for your attention
Diego
Calvanese
Scientific
advisor of the
board
Full professor
at unibz
ACM Fellow
Benjamin
Cogrel
CTO
Chair of the
board
Peter
Hopfgartner
CEO
Marco
Montali
Scientific
consultant
Assoc. professor
at unibz
The Team
Guohui
Xiao
Chief scientist
Jun. professor
at unibz
Big SPARQL endpoints: Typical queries
# Wikidata: bands that start with "Radio"
# try it on https://query.wikidata.org
SELECT DISTINCT ?band ?bandLabel
WHERE
{
?band wdt:P31 wd:Q215380 .
?band rdfs:label ?bandLabel .
FILTER(STRSTARTS(?bandLabel, 'Radio')) .
}
# DBPedia: facts about Joe Biden
SELECT ?property ?hasValue ?isValueOf
WHERE {
{ <http://dbpedia.org/resource/Joe_Biden> ?property ?hasValue }
UNION
{ ?isValueOf ?property <http://dbpedia.org/resource/Joe_Biden> }
}
Evolutions of Open Data: Linked Data
❏ Use URIs to name (identify) things.
❏ Use HTTP URIs so that these things can be looked up
(interpreted, “dereferenced”).
❏ Provide useful information about what a name identifies
when it’s looked up, using open standards such as RDF,
SPARQL, etc.
❏ Refer to other things using their HTTP URI-based names
when publishing data on the Web.
Tim Berners-Lee, 2006

Más contenido relacionado

La actualidad más candente

Toying with spark
Toying with sparkToying with spark
Toying with sparkRaymond Tay
 
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingKristian Alexander
 
Automated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureAutomated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureSpark Summit
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowKristian Alexander
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronDuyhai Doan
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemDuyhai Doan
 
Instrumentation with Splunk
Instrumentation with SplunkInstrumentation with Splunk
Instrumentation with SplunkDatavail
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet odsc
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talkrtelmore
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataJimmy Angelakos
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1ArangoDB Database
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookDatabricks
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)EUCLID project
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 
Mapping Relational Databases to Linked Data
Mapping Relational Databases to Linked DataMapping Relational Databases to Linked Data
Mapping Relational Databases to Linked DataEUCLID project
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development Spark Summit
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure wayBahadir Cambel
 

La actualidad más candente (20)

Toying with spark
Toying with sparkToying with spark
Toying with spark
 
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
 
Automated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative InfrastructureAutomated Spark Deployment With Declarative Infrastructure
Automated Spark Deployment With Declarative Infrastructure
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
 
Instrumentation with Splunk
Instrumentation with SplunkInstrumentation with Splunk
Instrumentation with Splunk
 
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
 
Apache spark basics
Apache spark basicsApache spark basics
Apache spark basics
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Mapping Relational Databases to Linked Data
Mapping Relational Databases to Linked DataMapping Relational Databases to Linked Data
Mapping Relational Databases to Linked Data
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 

Similar a SFScon 2020 - Peter Hopfgartner - Open Data de luxe

Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Olaf Hartig
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Sparkjlacefie
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetupjlacefie
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with PythonGokhan Atil
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkMartin Toshev
 
Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Holden Karau
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati
 
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOVirtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOPaolo Cristofaro
 
Jena University Talk 2016.03.09 -- SQL at Zalando Technology
Jena University Talk 2016.03.09 -- SQL at Zalando TechnologyJena University Talk 2016.03.09 -- SQL at Zalando Technology
Jena University Talk 2016.03.09 -- SQL at Zalando TechnologyValentine Gogichashvili
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau
 
Visualizing Web Data Query Results
Visualizing Web Data Query ResultsVisualizing Web Data Query Results
Visualizing Web Data Query ResultsAnja Jentzsch
 
WWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesWWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesPablo Mendes
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 

Similar a SFScon 2020 - Peter Hopfgartner - Open Data de luxe (20)

Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Spark
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
 
Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015
 
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OOVirtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
 
Jena University Talk 2016.03.09 -- SQL at Zalando Technology
Jena University Talk 2016.03.09 -- SQL at Zalando TechnologyJena University Talk 2016.03.09 -- SQL at Zalando Technology
Jena University Talk 2016.03.09 -- SQL at Zalando Technology
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Visualizing Web Data Query Results
Visualizing Web Data Query ResultsVisualizing Web Data Query Results
Visualizing Web Data Query Results
 
WWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesWWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL Queries
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 

Más de South Tyrol Free Software Conference

SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...South Tyrol Free Software Conference
 
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...South Tyrol Free Software Conference
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSouth Tyrol Free Software Conference
 
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...South Tyrol Free Software Conference
 
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...South Tyrol Free Software Conference
 
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...South Tyrol Free Software Conference
 
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSouth Tyrol Free Software Conference
 
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure mattersSFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure mattersSouth Tyrol Free Software Conference
 
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...South Tyrol Free Software Conference
 
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...South Tyrol Free Software Conference
 
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free softwareSFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free softwareSouth Tyrol Free Software Conference
 
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...South Tyrol Free Software Conference
 
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changerSFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changerSouth Tyrol Free Software Conference
 
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...South Tyrol Free Software Conference
 
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation InternetSFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation InternetSouth Tyrol Free Software Conference
 
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...South Tyrol Free Software Conference
 

Más de South Tyrol Free Software Conference (20)

SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
 
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
 
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
 
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
 
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
 
SFSCON23 - Christian Busse - Free Software and Open Science
SFSCON23 - Christian Busse - Free Software and Open ScienceSFSCON23 - Christian Busse - Free Software and Open Science
SFSCON23 - Christian Busse - Free Software and Open Science
 
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure mattersSFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
 
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portalSFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
 
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
 
SFSCON23 - Stefan Mutschlechner - Smart Werke Meran
SFSCON23 - Stefan Mutschlechner - Smart Werke MeranSFSCON23 - Stefan Mutschlechner - Smart Werke Meran
SFSCON23 - Stefan Mutschlechner - Smart Werke Meran
 
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
 
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free softwareSFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
 
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
 
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changerSFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
 
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
 
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation InternetSFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
 
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis MapsSFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
 
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

SFScon 2020 - Peter Hopfgartner - Open Data de luxe

  • 1. Open Data de luxe: Querying public SPARQL endpoints from the command line, R and Pandas Bolzano - 13 NOV 2020 We make data actually usable Making the most of Open Data Hub, Wikidata, DBpedia and other sources of high quality data
  • 2. Evolutions of Open Data: 5 star Open Data ★ available on the web (whatever format) but with an open licence ★★ plus: available as machine-readable structured data (e.g. excel instead of image scan of a table) ★★★ plus: non-proprietary format (e.g. CSV instead of excel) ★★★★ plus: Use open standards from W3C (RDF and SPARQL) to identify things ★★★★★plus: Link your data to other people’s data to provide context https://5stardata.info/
  • 3. Evolutions of Open Data: FAIR Findable, Accessible, Interoperable and Reusable (FAIR) FAIR data is not always open data (personal data, competitive data etc.) ❖ It facilitates data interchange on the web ❖ It facilitates data integration across sources even when schemas are different ❖ It supports evolution of schemas over time with minimal disruption to data consumers https://www.go-fair.org
  • 4. Technology of choice: 1 - RDF RDF is “a standard model for data interchange on the Web” Large graphs are build on triples @prefix ab: <http://learningsparql.com/ns/addressbook#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . ab:i0432 ab:firstName "Richard" ; ab:lastName "Mutt" ; ab:spouse ab:i9771 . ab:i8301 ab:firstName "Craig" ; ab:lastName "Ellis" ; ab:patient ab:i9771 . ab:i9771 ab:firstName "Cindy" ; ab:lastName "Marshall" . ab:spouse rdf:type owl:SymmetricProperty ; rdfs:comment "Identifies someone's spouse" . ab:patient rdf:type rdf:Property ; rdfs:comment "Identifies a doctor's patient" . subject predicate object
  • 5. Technology of choice: 2 - SPARQL SPARQL, the language to to select, update, create and delete triples PREFIX schema: <http://schema.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT * WHERE { ?t a schema:PerformingArtsTheater ; geo:asWKT ?pos ; schema:name ?posLabel . }
  • 6. Technology of choice: 2 - SPARQL SPARQL is similar to SQL, but is web age: ★ HTTP/S as transport protocol ★ No drivers required ★ Standardized by the W3C
  • 7. Your personal SPARQL database: Tracker Tracker is the file system indexer used by the Gnome desktop, e.g. for Full Text Search $ tracker sparql -q "SELECT DISTINCT ?performerName WHERE {?s <http://www.tracker-project.org/temp/nmm#performer> ?performerName . }" Results: urn:artist:Yasmine%20Hamdan urn:artist:Otfried%20Preu%C3%9Fler urn:artist:Queens%20Of%20The%20Stone%20Age urn:artist:Guns%20N'Roses ...
  • 8. Big SPARQL endpoints: Wikidata Wikidata handles the fact data for wikipedia articles Data from Wikidata Link to Wikidata entry
  • 10. Big SPARQL endpoints: DBpedia DBpedia extracts the data from Wikipedia and makes this data available and downloadable
  • 11. Big SPARQL endpoints: Typical queries
  • 12. Big SPARQL endpoints: datacommons.org Operated by Google. Integrates many data sources: ★ United States Census ★ World Bank ★ US Bureau of Labor Statistics ★ Wikipedia ★ National Oceanic and Atmospheric Administration ★ Federal Bureau of Investigation ★ ...
  • 13. 0 KM endpoints: The Open Data Hub Operated by NOI Techpark (https://sparql.opendatahub.bz.it/)
  • 14. How can I use these end points for my analyses?
  • 15. $ curl -X POST https://query.wikidata.org/sparql -H "Accept: text/csv" --data-urlencode query@countries.rq Command line: cURL # countries.rq SELECT DISTINCT ?countryLabel ?population ?area WHERE { ?country wdt:P31 wd:Q6256 . ?country wdt:P1082 ?population . ?country wdt:P2046 ?area . MINUS {?country wdt:P31 wd:Q3024240 .} SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". } } ORDER BY DESC(?population)
  • 16. $ ${JENA_DIR}/bin/rsparql --service 'https://query.wikidata.org/sparql' --query countries.rq --results=CSV > countries.csv Command line: rsparql from Apache # countries.rq SELECT DISTINCT ?countryLabel ?population ?area WHERE { ?country wdt:P31 wd:Q6256 . ?country wdt:P1082 ?population . ?country wdt:P2046 ?area . MINUS {?country wdt:P31 wd:Q3024240 .} SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". } } ORDER BY DESC(?population)
  • 17. Directly from R library(WikidataQueryServiceR) r <- query_wikidata(' SELECT DISTINCT ?countryLabel ?population ?area WHERE { ?country wdt:P31 wd:Q6256 . ?country wdt:P1082 ?population . ?country wdt:P2046 ?area . MINUS {?country wdt:P31 wd:Q3024240 .} SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". } } ORDER BY DESC(?population) ') head(r) # A tibble: 6 x 3 countryLabel population area <chr> <dbl> <dbl> 1 People's Republic of China 1409517397 9596961 2 India 1326093247 3287263 3 United States of America 328239523 9826675 ...
  • 18. Python with the requests module import requests url = "https://sparql.opendatahub.bz.it/sparql" q = """ PREFIX schema: <http://schema.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT * WHERE { ?t a schema:PerformingArtsTheater ; geo:asWKT ?pos ; schema:name ?posLabel . } """ r = requests.get(url, params = {'query': q}, headers={'Content-Type': 'application/sparql-results+json'}) print(r.json()) It works, but the returned results are not directly usable as a table.
  • 19. Python with sparql_client import sparql endpoint = "https://sparql.opendatahub.bz.it/sparql" q = """ PREFIX schema: <http://schema.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT * WHERE { ?t a schema:PerformingArtsTheater ; geo:asWKT ?pos ; schema:name ?posLabel . } """ result = sparql.query(endpoint, q) for row in result: print (row) (<IRI <http://noi.example.org/data/poi/9621F83525089644A0D47464D27D634E>>, <Literal "POINT (11.3534199999999998 46.4990740000000002)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>>, <Literal "Kleinkunsttheater Carambolage">) ... Good, but needs some rework for Pandas
  • 20. Python with sparql-dataframe import sparql_dataframe endpoint = "https://sparql.opendatahub.bz.it/sparql" q = """ PREFIX schema: <http://schema.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT * WHERE { ?t a schema:PerformingArtsTheater ; geo:asWKT ?pos ; schema:name ?posLabel . } """ df = sparql_dataframe.get(endpoint, q) Most comfortable solution for Pandas
  • 21. What makes RDF / SPARQL great for data exchange? ★ Data really be queried, not only downloaded ★ Well structured data with rich data models, often standardized and good metadata ★ Data is easy to integrate ★ Technology is easy to integrate
  • 22. Thank you for your attention
  • 23. Diego Calvanese Scientific advisor of the board Full professor at unibz ACM Fellow Benjamin Cogrel CTO Chair of the board Peter Hopfgartner CEO Marco Montali Scientific consultant Assoc. professor at unibz The Team Guohui Xiao Chief scientist Jun. professor at unibz
  • 24. Big SPARQL endpoints: Typical queries # Wikidata: bands that start with "Radio" # try it on https://query.wikidata.org SELECT DISTINCT ?band ?bandLabel WHERE { ?band wdt:P31 wd:Q215380 . ?band rdfs:label ?bandLabel . FILTER(STRSTARTS(?bandLabel, 'Radio')) . } # DBPedia: facts about Joe Biden SELECT ?property ?hasValue ?isValueOf WHERE { { <http://dbpedia.org/resource/Joe_Biden> ?property ?hasValue } UNION { ?isValueOf ?property <http://dbpedia.org/resource/Joe_Biden> } }
  • 25. Evolutions of Open Data: Linked Data ❏ Use URIs to name (identify) things. ❏ Use HTTP URIs so that these things can be looked up (interpreted, “dereferenced”). ❏ Provide useful information about what a name identifies when it’s looked up, using open standards such as RDF, SPARQL, etc. ❏ Refer to other things using their HTTP URI-based names when publishing data on the Web. Tim Berners-Lee, 2006