SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
© Copyright 2015 STI INNSBRUCK www.sti-innsbruck.at
Elias Kärle – 17. April 2015 – Tourism Fast Forward 2015, Mayrhofen, Tirol
schema.org auf
Hotelwebseiten
@eliaska
#tff_15
www.sti-innsbruck.at
Inhalt
1. Motivation
2. Daten
3. Analyse
www.sti-innsbruck.at
1. Motivation
3
www.sti-innsbruck.at
1. Motivation
• Dieter Fensel hat einen Wikipedia Eintrag
4
www.sti-innsbruck.at
1. Motivation
• Italienischer Schwimmer VS. @cyberandy
• Wie hat er das gemacht?
5
www.sti-innsbruck.at
1. Motivation
• Schema.org annotation
• Hotellerie und Tourismus
 werden Annotationen verwendet?
6
www.sti-innsbruck.at
1. Motivation
1) Wie viele Hotels verwenden schema.org?
2) Wie wird schema.org verwendet?
1) Welche Klassen?
2) Welche Attribute?
3) Wird schema.org richtig eingesetzt?
3) Wer verwendet schema.org im touristischen Bereich?
7
www.sti-innsbruck.at
2. Daten
Was ist schema.org?
• Initiative geründet 2011
• Ontologie zur Strukturierung von Daten auf Webseiten
• In HTML eingebunden
– Microdata
– RDFa
– JSON-LD
Source: http://www.schema.org
8
www.sti-innsbruck.at
2. Daten
Analyse aller Webseiten:
• Gegründet 2007
• Non-Profit Organisation
• Erfasst (crawlt) Internet 4 mal pro Jahr
• Datensätze frei zugänglich
• November 2013: 2,3 Milliarden Webseiten, 148TB
• Dezember 2014: 2,1 Milliarden Webseiten, 160TB
Source: http://commoncrawl.org/the-data/get-started/
9
www.sti-innsbruck.at
2. Daten
Reduktion auf Strukturierte Daten:
WebDataCommons:
• 2012 Freie Universität Berlin & KIT
• Derzeit Uni Mannheim
• Leitung: Chris Bizer
• Extrahieren aller strukturierter Daten aus Common Crawl
– Web Tabellen: 147 Mio. relationale Tab. (11 Mrd. HTML Tab.)
– Hyperlink Graph: 3,5 Mrd. Webseiten, 128 Mrd. Links
– Semantisch annotierte Daten:
• November 2013: 44TB, 2.2 Mrd. URLs
• Dezember 2014: 160TB, 2 Mrd. URLs
Source: http://webdatacommons.org/structureddata/
10
www.sti-innsbruck.at
2. Daten
• November 2013 Datensatz
• Subset: schema.org/Hotel
– 35GB
– 127 Mio. Triples
• OWLIM-SE Repository
• SPARQL Anfragen
• Linux Debian 3.2, STI
11
www.sti-innsbruck.at
3. Analyse
1) Wie viele Hotels sind schema.org annotiert?
4.841.353
• Hotels doppelt annotiert
– Eigene Webseite
– Buchungs Webseite
740.298
• Alle Hotels mit gleichem Namen verloren
– Adler, Post, ...
 An Adresse binden!
12
www.sti-innsbruck.at
3. Analyse
Hotel
4.841.353
Adresse
3.035.000
Land
1.904.000
Name
1.125.000
Region
1.902.000
PLZ
2.011.000
Straße
2.284.000
13
www.sti-innsbruck.at
3. Analyse
Hotels pro Land
Österreich: 148
Tirol: 287
Innsbruck: 63
1.US 1021513
2.CA 52360
3.CN 20648
4.GB 11580
5.DE 3163
6.MX 1921
7.PR 1250
8.AR 1016
9.PH 765
10.IN 699
11.TR 681
12.AE 391
13.KR 377
14.RO 373
15.QA 343
16.PA 299
17.SA 292
18.AU 290
19.BR 258
20.CH 238
21.TH 234
22.SR 217
23.HK 156
24.EC 150
25.AT 148
26.CO 143
27.PE 129
28.BE 127
29.ID 109
30.BH 93
 Offensichtlich nicht korrekt annotiert
14
www.sti-innsbruck.at
3. Analyse
Hotels nach Postleitzahlen in Tirol
18%
10%
8%
4%
4%
3%
2%2%2%2%
45%
6020 6370 6100 6450 6580 6456 6215 6213 6365 6010 Andere
Innsbruck
Kitzbühel
Seefeld
Sölden
St. Anton
Obergurgl
Achenkirch
Pertisau
Kirchberg
15
www.sti-innsbruck.at
3. Analyse
Welche Kategorien von Hotels sind annotiert?
http://schema.org/Rating
16
www.sti-innsbruck.at
3. Analyse
Hotel
4.841.353
Adresse
3.035.000
Land
1.904.000
Name
1.125.000
Region
1.902.000
PLZ
2.011.000
Straße
2.284.000
17
www.sti-innsbruck.at
3. Analyse
Hotel
4.841.353
Adress
3.035.00
Land
1.904.000
Name
1.125.000
Region
1.902.000
Rating
2.377.000
RatingValue
2.375.000
18
www.sti-innsbruck.at
3. Analyse
Welche Kategorien von Hotels sind annotiert?
866.932
651.606
426.925
176.800
135.958
35.079
66.208
15.476
941
19
www.sti-innsbruck.at
3. Analyse
2) Wie wird schema.org verwendet?
15%
14%
13%
9%
8%
7%
7%
5%
5%
4%
13%
schema.org Verwendung
http://schema.org/Hotel/name http://schema.org/Hotel/review
http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Hotel/image
http://schema.org/Hotel/address http://schema.org/Hotel/aggregateRating
http://schema.org/Hotel/rating http://schema.org/Hotel/description
http://schema.org/Hotel/url http://schema.org/Hotel/geo
Other
20
www.sti-innsbruck.at
3. Analyse
3) Wer verwendet schema.org im touristischen Bereich
Hypothese:
„Schema.org wird überwiegend von Booking- und
Ratingseiten verwendet, kaum auf Hotelseiten direkt.“
21
www.sti-innsbruck.at
3. Analyse
Ansatz:
• Hotels auf Booking- & Ratingseiten
suche nach annotation auf eigener Webseite
• Gegenprobe mit annotierten Hotelswebseiten
Mehrfache Vorkommen im Datensatz?
Dezeit: exemplarisch (Top-Buchungsseiten)
Nächster Schritt: vollständiger Datensatz
22
www.sti-innsbruck.at
3. Analyse
Resymee:
• Hauptanwender von schema.org/Hotel:
Buchungs- und Ratingseiten
Fehler:
Unvollständig
Falsche Klassen
Falsche Attribute
Falsche Datentypen
Vollständige Fehleranalyse: Uni Mannheim
(R. Meusel & H. Paulheim) [1]
[1] http://dws.informatik.uni-mannheim.de/fileadmin/lehrstuehle/ki/pub/MeuselPaulheim-HeuristicsForFixingCommonErrorsInDeployedSchemaOrgMicrodata-ESWC2015.pdf
23
www.sti-innsbruck.at
3. Analyse
www.sti-innsbruck.at
3. Analyse
Annotation „Hotel“ richtig  aber auf JEDER Unterseite!
www.sti-innsbruck.at
3. Analyse
Schema.org verwenden, richtig annotieren:
• RFDa, Microdata, JSON-LD
• Dokumentation: http://www.schema.org
• Testen: https://developers.google.com/structured-data/testing-tool/
„Be part of the graph!“
Google, Bing, Yahoo! & Yandex
26

Más contenido relacionado

Destacado

TESC Presentation Feb 2009
TESC Presentation Feb 2009TESC Presentation Feb 2009
TESC Presentation Feb 2009Nik Panter
 
High Security Zones and the Righ to Return and Restitution in Sri Lanka
High Security Zones and the Righ to Return and Restitution in Sri LankaHigh Security Zones and the Righ to Return and Restitution in Sri Lanka
High Security Zones and the Righ to Return and Restitution in Sri LankaSanda Wijeratne
 
20150226_Infoday H2020_Energía_María Luisa Revilla
20150226_Infoday H2020_Energía_María Luisa Revilla20150226_Infoday H2020_Energía_María Luisa Revilla
20150226_Infoday H2020_Energía_María Luisa RevillaRedit
 
Bases Portatil Dell
Bases Portatil DellBases Portatil Dell
Bases Portatil Delldiegops
 
Ovret innovation and implementation in health care
Ovret innovation and implementation in health careOvret innovation and implementation in health care
Ovret innovation and implementation in health care john
 
Producto 10 despierta baby
Producto 10 despierta babyProducto 10 despierta baby
Producto 10 despierta babyjose15calderon
 
IbO Software de Innovación por Objetivos Empresas
IbO Software de Innovación por Objetivos EmpresasIbO Software de Innovación por Objetivos Empresas
IbO Software de Innovación por Objetivos EmpresasJuanCAC
 
Email marketing and Science of Storytelling
Email marketing and Science of StorytellingEmail marketing and Science of Storytelling
Email marketing and Science of StorytellingJuvlon Email Marketing
 
Indian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMS
Indian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMSIndian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMS
Indian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMSDr. Kabita Mishra
 
Bringing mobile apps to market faster using rapid application prototyping
Bringing mobile apps to market faster using rapid application prototypingBringing mobile apps to market faster using rapid application prototyping
Bringing mobile apps to market faster using rapid application prototypingPidoco
 
TICs Educ. Primaria-Modulo docentes
TICs Educ. Primaria-Modulo docentesTICs Educ. Primaria-Modulo docentes
TICs Educ. Primaria-Modulo docentesInes Villeneuve
 

Destacado (19)

CCAFS Strategy for Latin America
CCAFS Strategy for Latin AmericaCCAFS Strategy for Latin America
CCAFS Strategy for Latin America
 
TESC Presentation Feb 2009
TESC Presentation Feb 2009TESC Presentation Feb 2009
TESC Presentation Feb 2009
 
High Security Zones and the Righ to Return and Restitution in Sri Lanka
High Security Zones and the Righ to Return and Restitution in Sri LankaHigh Security Zones and the Righ to Return and Restitution in Sri Lanka
High Security Zones and the Righ to Return and Restitution in Sri Lanka
 
Ingenieur_RvS
Ingenieur_RvSIngenieur_RvS
Ingenieur_RvS
 
20150226_Infoday H2020_Energía_María Luisa Revilla
20150226_Infoday H2020_Energía_María Luisa Revilla20150226_Infoday H2020_Energía_María Luisa Revilla
20150226_Infoday H2020_Energía_María Luisa Revilla
 
Mediamax
MediamaxMediamax
Mediamax
 
Bases Portatil Dell
Bases Portatil DellBases Portatil Dell
Bases Portatil Dell
 
Ovret innovation and implementation in health care
Ovret innovation and implementation in health careOvret innovation and implementation in health care
Ovret innovation and implementation in health care
 
Asesoria malaga
Asesoria malagaAsesoria malaga
Asesoria malaga
 
Producto 10 despierta baby
Producto 10 despierta babyProducto 10 despierta baby
Producto 10 despierta baby
 
valladolid desde el aire,,,,,,,,espectacular
valladolid desde el aire,,,,,,,,espectacularvalladolid desde el aire,,,,,,,,espectacular
valladolid desde el aire,,,,,,,,espectacular
 
IbO Software de Innovación por Objetivos Empresas
IbO Software de Innovación por Objetivos EmpresasIbO Software de Innovación por Objetivos Empresas
IbO Software de Innovación por Objetivos Empresas
 
Email marketing and Science of Storytelling
Email marketing and Science of StorytellingEmail marketing and Science of Storytelling
Email marketing and Science of Storytelling
 
IRP for Dummies
IRP for DummiesIRP for Dummies
IRP for Dummies
 
Bedeutung von Markenartikeln für Senioren - 3 Fragen an Alexander Wild
Bedeutung von Markenartikeln für Senioren - 3 Fragen an Alexander WildBedeutung von Markenartikeln für Senioren - 3 Fragen an Alexander Wild
Bedeutung von Markenartikeln für Senioren - 3 Fragen an Alexander Wild
 
Indian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMS
Indian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMSIndian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMS
Indian Homoeopathic Medical Association-IHMA by Dr. Kabita Mishra BHMS
 
Bringing mobile apps to market faster using rapid application prototyping
Bringing mobile apps to market faster using rapid application prototypingBringing mobile apps to market faster using rapid application prototyping
Bringing mobile apps to market faster using rapid application prototyping
 
TICs Educ. Primaria-Modulo docentes
TICs Educ. Primaria-Modulo docentesTICs Educ. Primaria-Modulo docentes
TICs Educ. Primaria-Modulo docentes
 
Timex
TimexTimex
Timex
 

Más de TourismFastForward

TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTourismFastForward
 

Más de TourismFastForward (20)

TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data NexusTFF2023 - Navigating Tourism Data Nexus
TFF2023 - Navigating Tourism Data Nexus
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 
TFF2022 Mobility in Tourism
TFF2022 Mobility in TourismTFF2022 Mobility in Tourism
TFF2022 Mobility in Tourism
 
TFF2022 - Mobility in Tourism
TFF2022 - Mobility in TourismTFF2022 - Mobility in Tourism
TFF2022 - Mobility in Tourism
 

TFF2015, Elias Kärle, STI Innsbruck, "Verbreitung von schema.org auf Hotelwebseiten"