SlideShare a Scribd company logo
1 of 55
Download to read offline
Linked Data at
Semantic Team
semantica@corp.globo.com
Tatiana Al-Chueyr and Rodrigo D. A. Senra
{tatiana.martins, rodrigo.senra}@corp.globo.com
globo.com
Andréia Bustamante
Ícaro Medeiros
Tatiana Al-Chueyr
Rodrigo Senra
Semantic Team
Franklin Amorim
João Carlos Mendes Luís
Alberto Beloni
André Nicodemus
Contributors
BROADCAST MOVIES PAY TV INTERNET
EVENTS MUSIC
PUBLISHING
NEW VENTURES NEWSPAPERRADIO NETWORK
Motivation
Soccer player
Cross-link content from different web products
Politician
MotivationCross-link content from different web products
Celebrity
Motivation
● Cross-link content from different web products
MotivationCross-link content from different web products
Isabella Nardoni foi morta em 29 de março de 2008
na Zona Norte de São Paulo (Foto:Reprodução)
Isabella de Oliveira Nardoni, de 5
anos, foi morta na noite de 29 de
março de 2008. A perícia concluiu
que a menina foi atirada do sexto
andar do prédio onde moravam seu
pai, Alexandre Nardoni, sua
madrasta, Anna Carolina Jatobá, e
dois filhos pequenos do casal, na
Vila Isolina Mazzei, na zona norte de
São Paulo.
Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.
Caso Isabella Nardoni
Juliana Cardilli
G1 SP
RDF
FOAF
GEO
Dublin
Core
SKOS
Semantic markup in web pages
Motivation
Recommend annotations to information Producer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Outcomes
● Flexible ways to organize content
● Ease to find related issues
● Explicit relations derived from annotated content
● Up-to-date topic pages with little editorial effort
● Linking content across different web products
● Seamless navigation leading to flow state
Status Quo
Used by the main web products of Globo.com
linking, among others:
○ 18,485 organizations
○ 82,386 people
○ 9,129 places
○ 1,000,000+ annotated news
from August 2010 to May 2013
Legacy Architecture
CDA
CMA
triple
store
search
engine
ontology
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
Poor data management
○ direct access to triple store (unmanaged)
○ difficulty to share data (distributed DBs)
○ re-sync triple-store and search engine index
○ scalability of triple store
○ high entropy in distributed ontology engineering
Problems
Problems
Ontology Engineering
Domain-driven
(current)
Base
G1 GE EGO TVG
news sports gossip tv
Upper
Person Organization
Music
Politics
Programme Education
Sports
Product-driven
(past)
Place
Possible Solution
Upper
Ontology
Semantic as a library
○ many different versions in production
○ programming language dependent
○ steep learning curve for RDF/OWL/SPARQL
Problems
Create an open semantic data management platform
● Scalable
● Mobile and Web friendly
● Interconnect Globo's data with external data sources
● Automate content extraction (including NER)
Next Step
Brainiak
linked data restful
API
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
API
Brainiak
CMA
CDA
CDA
CDA
CDA
triple
store
search
engine
Under Development
Requirements
● Indirect usage of SPARQL
● Programming language independent
● Data management with quality
● Finer-grained authorization and authentication
● Isolate applications from triplestore
● Improve triplestore performance
SPARQL query
DEFINE input:inference <http://data.globo.com/ruleset>
SELECT ?uri ?label
FROM <http://data.globo.com/sports/>
WHERE
{
?uri a <http://data.globo.com/sports/Team>;
rdfs:label ?label .
}
LIMIT 10
OFFSET 0
task: list all sports teams
/sports/Team
Brainiak query
GET
SPARQL response
Brainiak response
Brainiak concepts
● Instance
● Collection (set of instances from a given Class)
● Schema (the Class definition)
● Context
Instance
Collection
Schema
Context
place
State
Brazil
Country
Japan
City
Real example
/placeGET
/place/CountryGET
/place/Country/_schemaGET
/place/Country/BrazilGET
Real example
resource URL→ /place/Country/Brazil
context (graph)→ http://semantica.globo.com/place/
class → http://semantica.globo.com/place/Country
instance → http://semantica.globo.com/place/Country/Brazil
URI Conventions
/place/River
?graph_uri=http://dbpedia.org/resource/classes#
&class_uri=dbpedia:River
Overriden
context (graph) → http://dbpedia.org/resource/classes#
class → http://dbpedia.org/ontology/River
Convention
context (graph)→ http://semantica.globo.com/place/
class → http://semantica.globo.com/place/River
Legacy URIs
Hypermedia
● Flexibility and programmatic adaptation
● Semantic affordances
● Client has to understand what is consumed
● "Hypermedia APIs are not fully baked yet"
Brainiak hypermedia graph
context instance
/ schema
inCollection
item
instances
instances
describedBy
self
replace
delete
self
instances
self
self
self
create
collection
Services
● List Contexts
● List Collections
● Get a Schema
● List Prefixes
● Status of Services
● Create
● Retrieve
● Delete
● Edit
● List
Instances
Features
● JSON-Schema
● JSON-LD
● REST
● Python + Tornado
OPTIONS GET PUT POST DELETE
/sports/Team
Brainiak query
GET
Brainiak response
Brainiak response
Brainiak response
Brainiak response
SPARQL query
SELECT DISTINCT ?class
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION
(TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) .
?class a owl:Class .
}
task: retrieve all superclasses of a class
SPARQL query
SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_property
WHERE {
{
GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } .
} UNION {
graph ?predicate_graph {?predicate rdfs:domain ?blank} .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?domain_class } .
}
FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.
com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>))
{?predicate rdfs:range ?range .}
UNION {
?predicate rdfs:range ?blank .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?range } .
}
FILTER (!isBlank(?range))
?predicate rdfs:label ?title .
?predicate rdf:type ?type .
OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } .
FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) .
FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) .
OPTIONAL { ?predicate rdfs:comment ?predicate_comment }
FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) .
OPTIONAL {
GRAPH ?range_graph {
?range rdfs:label ?range_label .
FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) .
}
}
}
task: retrieve all properties of a group of classes
SPARQL query
SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n,
t_min (0)) .
?s owl:onProperty ?predicate .
OPTIONAL { ?s owl:minQualifiedCardinality ?min } .
OPTIONAL { ?s owl:maxQualifiedCardinality ?max } .
OPTIONAL {
{ ?s owl:onClass ?range }
UNION { ?s owl:onDataRange ?range }
UNION { ?s owl:allValuesFrom ?range }
OPTIONAL { ?range owl:oneOf ?enumeration } .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?enumerated_value } .
OPTIONAL {
?enumerated_value rdfs:label ?enumerated_value_label .
} .
}
}
}
task: retrieve the cardinalities of all properties of a certain class
/place/City/_schema
Brainiak query
GET
● SEO (automatic schema.org)
● Improved annotator (DBpedia Spotlight)
● Richer content relationships (inference)
● Link to open data (e.g. DBPedia, dados.gov.br)
Next steps
Stay tuned
@brainiak_api
... will be soon released
as an open source project !
Semantic Team
semantica@corp.globo.com
globo.com
Thank you
for the attention!

More Related Content

Viewers also liked

Viewers also liked (11)

Uma breve história no tempo...da computação
Uma breve história no tempo...da computaçãoUma breve história no tempo...da computação
Uma breve história no tempo...da computação
 
Brainiak: Um plano maligno de dominação semântica hipermídia
Brainiak: Um plano maligno de dominação semântica hipermídiaBrainiak: Um plano maligno de dominação semântica hipermídia
Brainiak: Um plano maligno de dominação semântica hipermídia
 
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
Rest - Representational State Transfer (EMC BRDC Internal Tech talk)
 
Brainiak - uma API REST Hipermedia
Brainiak - uma API REST Hipermedia Brainiak - uma API REST Hipermedia
Brainiak - uma API REST Hipermedia
 
Rest, Gateway e Compiladores
Rest, Gateway e CompiladoresRest, Gateway e Compiladores
Rest, Gateway e Compiladores
 
Python: A Arma Secreta do Cientista de Dados
Python: A Arma Secreta do Cientista de DadosPython: A Arma Secreta do Cientista de Dados
Python: A Arma Secreta do Cientista de Dados
 
Cientista de Dados - A profissão mais sexy do século 21
Cientista de Dados - A profissão mais sexy do século 21Cientista de Dados - A profissão mais sexy do século 21
Cientista de Dados - A profissão mais sexy do século 21
 
Linked Data on the BBC
Linked Data on the BBCLinked Data on the BBC
Linked Data on the BBC
 
pa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processingpa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processing
 
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
 
Python: a arma secreta do Cientista de Dados
Python: a arma secreta do Cientista de DadosPython: a arma secreta do Cientista de Dados
Python: a arma secreta do Cientista de Dados
 

Similar to Linked data at globo.com

Semantic day 2013 linked data at globo.com
Semantic day 2013   linked data at globo.comSemantic day 2013   linked data at globo.com
Semantic day 2013 linked data at globo.com
Semantic Team
 
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
Bishop Fox
 

Similar to Linked data at globo.com (20)

Semantic day 2013 linked data at globo.com
Semantic day 2013   linked data at globo.comSemantic day 2013   linked data at globo.com
Semantic day 2013 linked data at globo.com
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.com
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
A Day in the Life of a Functional Data Scientist
A Day in the Life of a Functional Data ScientistA Day in the Life of a Functional Data Scientist
A Day in the Life of a Functional Data Scientist
 
Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?Dove sono i tuoi vertici e di cosa stanno parlando?
Dove sono i tuoi vertici e di cosa stanno parlando?
 
GAB 2019 - Graph as a data store
GAB 2019 - Graph as a data storeGAB 2019 - Graph as a data store
GAB 2019 - Graph as a data store
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetup
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
InfoSec World 2013 – W4 – Using Google to Find Vulnerabilities in Your IT Env...
 
Sound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT InternetSound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT Internet
 
GeoLinkedData
GeoLinkedDataGeoLinkedData
GeoLinkedData
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
diadem-vldb-2015
diadem-vldb-2015diadem-vldb-2015
diadem-vldb-2015
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Pig on spark
Pig on sparkPig on spark
Pig on spark
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Linked data at globo.com