Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

semantic markup using

20.377 visualizaciones

Publicado el

A basic intro to microdata and, along with a new extension for datasets and data catalogs. "TWed" talk April 4, 2012.

Publicado en: Educación, Tecnología

semantic markup using

  1. Joshua ShinavierWednesday Nights in the Tetherless World (TWed) April 4th, 2012
  2. Outline• rich snippets • microformats • RDFa • microdata• microdata syntax• • deployment • mappings, tools, extensions• the Dataset extension 2
  3. 3
  4. the three syntaxes• several solutions for embedding semantic data in Web pages• three syntaxes known (by Google) as “rich snippets” - microformats - RDFa - HTML microdata• all three are supported by Google, while - microdata is the “recommended” syntax 4
  5. First came microformats• microformats emerged around 2005• some key principles - start by solving simple, specific problems - design for humans first, machines second• wide deployment - used on billions of Web pages - usage share was at 94% vis-a-vis competing formats (before microdata, anyway)• formats exist for marking up Atom feeds, calendars, addresses and contact info, geo-location, multimedia, news, products, recipes, reviews, resumes, social relationships, etc. 5
  6. microformats example<div class="vcard"> <a class="fn org url" href="">CommerceNet</a> <div class="adr"> <span class="type">Work</span>: <div class="street-address">169 University Avenue</div> <span class="locality">Palo Alto</span>, <abbr class="region" title="California">CA</abbr>&nbsp;&nbsp; <span class="postal-code">94301</span> <div class="country-name">USA</div> </div> <div class="tel"> <span class="type">Work</span> +1-650-289-4040 </div> <div>Email: <span class="email"></span> </div></div> 6
  7. then came RDFa• RDFa aims to bridge the gap between human- oriented HTML and machine-oriented RDF documents• provides XHTML attributes to indicate machine- understandable information• uses the RDF data model, and Semantic Web vocabularies directly 7
  8. RDFa example<div typeof="foaf:Person" xmlns:foaf=""> <p property="foaf:name"> Alice Birpemswick </p> <p> Email: <a rel="foaf:mbox"href=""></a> </p> <p> Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p></div> 8
  9. last but not least, microdata• microdata syntax is based on nested groups of name- value pairs• HTML microdata specification includes - an unambiguous parsing model - an algorithm to convert microdata to RDF• compatible with the Semantic Web via mappings 9
  10. 10
  11. microdata properties • annotate an item with text-valued properties using the “itemprop” attribute<div itemscope> <p>My name is <spanitemprop="name">Daniel</span>.</p></div> 11
  12. multiple values are OK • as in RDF, you can have two properties, for the same item (subject) with the same value (object)<div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul></div> 12
  13. item types • these correspond to classes in RDF<section itemscope itemtype=""> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy black fur with white pawsand belly.</p> <img itemprop="img" src="hedral.jpeg" alt=""title="Hedral, age 18 months"></section> 13
  14. global IDs • items may be given global identifiers, which are URLs • they may be, but do not need to be Semantic Web URIs<dl itemscope itemtype="" itemid="urn:isbn:0-330-34032-8"> <dt>Title <dd itemprop="title">The Reality Dysfunction <dt>Author <dd itemprop="author">Peter F. Hamilton <dt>Publication date <dd><time itemprop="pubdate" datetime="1996-01-26">26January 1996</time></dl> 14
  15. 15
  16. the vocabulary• is one of a number of microdata vocabularies• it is a shared collection of microdata schemas for use by webmasters• includes a type hierarchy, like an RDFS schema - starts with top-level Thing and DataType types - properties are inherited by descendant types 16
  17. Why should you use There are several reasons. 17
  18. current types (there are around 300 of them) 18
  19. In terms of deployment... ...a few key types stand out. 19
  20. Top types type occurrences relativeProduct 5001966 0.27689260175PostalAddress 1437388 0.07956913403WebPage 1402426 0.07763375119Offer 1267545 0.07016717684Book 1111463 0.06152698395Person 968737 0.05362613587AggregateRating 780967 0.04323179816GeoCoordinates 546586 0.03025722678LocalBusiness 544662 0.03015072039Article 525487 0.02908925463Place 490433 0.02714877897Residence 451652 0.02500198869ItemPage 421911 0.02335562347Organization 405876 0.02246797792Blog 268582 0.01486782772 20
  21. Who’s using it?Over 1,000 domains found (through Sindice) 21
  22. Some early adopters domain occurrences 3662 2852 2336 2003 2001 1953 1857 1564 1294 1274 1080 1065 1059 1004 1001 0.012028937 22
  23.• maintains ↔ RDF mappings - there are mappings for BIBO, DBpedia, Dublin Core, FOAF, GoodRelations, SIOC, and WordNet• also provides examples, tutorials, and data dumps See: 23
  24. tools• Google’s Rich Snippets Testing Tool• libraries are available in Java, JavaScript, Perl, PHP, Python, and Ruby• there are modules for Drupal, Joomla!, WordPress, and Virtuoso• online tools include microdata extractors, generators and validators• supports microdata See: 24
  25. extensions• there are dozens of community proposals - they extend existing vocabulary• several have already been accepted into, incl. - Job Postings - IPTC/rNews integration - User Comments• others: Comics, Learning Resources, TV and Radio, Software Application, etc. 25
  26. 26
  27. motivation: open government data 27
  28. the Dataset vocabulary: types• DataCatalog - a collection of datasets - e.g. the International Open Government Data catalog• Dataset - an individual, abstract data set - e.g. a data set about seismic hazard zones near San Francisco• DataDownload - a dataset in downloadable form - e.g. an RDF/XML dump of the seismic hazard zones data set 28
  29. the Dataset vocabulary: properties• catalog - the catalog containing a dataset• dataset - a dataset contained in a catalog• distribution - a data download for a dataset• keyword - the topic of a dataset• spatial - the spatial extent of a data set (e.g. United States) 29
  30. Dataset extension RDF• the Dataset extension maps to a subset of the Data Catalog Vocabulary (DCAT)• many other types and properties are inherited from• collectively, they cover - around 2/3 of DCAT, and - around half of the Asset Description Metadata Schema (ADMS) 30
  31. Dataset example (microdata)<div itemscope="itemscope" itemid=""itemtype=""> <a href=""><span itemprop="name"> <b>Seismic Hazard Zones</b> </span></a> <div><meta itemprop="url" content=""/> <span itemprop="description">The dataset represents the Liquefactionand Landslide Zones [...]</span></div> <div><i>Country:</i> <a href=""><spanitemprop="spatial" itemscope="itemscope" itemtype=""> <span itemprop="name">United States</span> </span> </a></div> <div><i>Publisher:</i> <span itemprop="publisher" itemscope="itemscope" itemtype=""> <span itemprop="name">Department of Technology</span> </span> </div></div> 31
  32. Dataset example (RDFa)<div about="" typeof="dcat:Dataset"> <div><b><a href=""> <span property="dcterms:title">Seismic Hazard Zones</span> </a></b></div> <div property="dcterms:description">The dataset represents theLiquefaction and Landslide Zones [...]</div> <div rel="dcterms:spatial" resource=""><i>Country:</i> <a href=""> <span about=""typeof="adms:Country"> <span property="dcterms:title">United States</span> </span> </a> </div> <div rel="dcterms:publisher"><i>Publisher:</i> <span typeof="foaf:Organization"> <span property="dcterms:title">Department of Technology</span> </span> </div></div> 32
  33. Google extracts this dataItemType: = Seismic Hazard Zonesurl = = The dataset represents the Liquefaction and Landslide Zones [...]spatial = Item( 1 )publisher = Item( 2 )Item 1Type: = United StatesItem 2Type: = Department of Technology 33
  34. Resources• HTML microdata -• -• W3C Web Schemas group ( -• The Dataset proposal -• Rich Snippets Testing Tool - 34
  35. Credits• word clouds by -• deployment statistics discovered using Sindice and Sindice4j - - 35
  36. Thanks!• Tetherless World Constellation •• Contact: •, @joshsh 36
  37. 37