This document discusses how RDFa helps with publishing and distributing data on the web. It provides examples of how RDFa can be used to embed reviews, product information, and chemical data in HTML pages. This allows the data to be extracted and used for vertical search engines, customizing user interfaces by linking to other datasets, and distributing data to other applications and services.
20. Pushing the limits
• Blogs where each entry
is an item for sale.
• Blogs where each entry
is a review.
• Blogs where entries
contain specialist
information.
• Web-pages that want to
be data.
26. Title: Chanda’s Secrets by Allan Stratton
Stars: *****
I reviewed Stratton's newest teen novel, Leslie's Journal in
October. I'd heard about Chanda's Secrets and wanted to give
it a try. ...
27. <div xmlns:v="http://rdf.data-vocabulary.org/#"
typeof="v:Review"
>
<span rel="v:itemreviewed">
<span>
Title: <span property="v:name">Chanda’s Secrets</span>
</span>
</span>
Stars: <span property="v:rating" content="5">*****</span>
<span property="v:summary">
I reviewed Stratton's newest teen novel, Leslie's Journal in October.
I'd heard about Chanda's Secrets and wanted to give it a try. ...
</span>
</div>
28. Number of stars, price range,
etc., all picked up from the
web-page via RDFa and
29. How RDFa is helping
Joining the linked data cloud
30. <div
typeof="v:Review"
>
<span rel="v:itemreviewed">
<span about="urn:ISBN:1550378341" typeof="bib:book">
Title: <span property="v:name">Chanda’s Secrets</span>
</span>
</span>
Stars: <span property="v:rating" content="5">*****</span>
<span property="v:summary">
I reviewed Stratton's newest teen novel, Leslie's Journal in October.
I'd heard about Chanda's Secrets and wanted to give it a try. ...
</span>
</div>
31.
32. Mark Birbeck’s Twitter account name is
<span
typeof="foaf:OnlineAccount"
rel="foaf:accountServiceHomePage" resource="http://twitter.com"
property="foaf:accountName"
>markbirbeck</span>
and Ben Adida’s is
<span
typeof="foaf:OnlineAccount"
rel="foaf:accountServiceHomePage" resource="http://twitter.com"
property="foaf:accountName"
>benadida</span>
40. <div about="#chem_123">
Viagra has the following CID identifier:
<span property="chem:cid">5281023</span>
</div>
<div about="#chem_124">
Methane has the following identifier:
<span property="chem:inchi">InChI=1/CH4/h1H4</span>
</div>
41. • Embedding RDFa about chemical symbols,
makes it possible to:
• create a chemistry-specific search engine;
• improve the UI for the blog.
49. RDFa
• Vertical search engines • Customise the UI by
joining the linked data
• Makes it easy for cloud
organisations to publish
data
• Make it easy for
individuals to publish
data
It&#x2019;s easy to publish longer documents via blog posts.
But you can even publish a web-page from an SMS message. (Every Tweet has a URL.)
Publishing data is difficult. We&#x2019;ll now look at some scenarios where we would like to publish precise information, rather than just publishing text.
We usually have to use some specialised site to sell items.
For example, eBay.
Yet the enormous reach of HTML and HTTP means that even small vendors are able to reach an international audience through the web, and need little more than a blog. This site has simple contact information in one blog post...
...and then an individual item for sale in each subsequent blog post. It&#x2019;s not eBay, but it gets them business.
However, it&#x2019;s difficult to publish in such a way that your post is aligned with others -- so that we&#x2019;re all talking about the same restaurant, for example. To do this, we usually have to join a centralised web-site.
Which then raises the issue of who owns your data -- a big question for a lot of people.
For example, on this site each blog post is a film review; the reviewer would like to keep the reviews on their site.
Similarly, on this site each blog post is a book review.
This site has worked hard to produce good reviews that people link to, and it shows in their high Google rank. So why would the blog-owner bother to subsume their reviews into some generalised review site?
Another example of where we want to publish more precise information is in specialist sites.
If Marie Curie were researching today, she might well use a blog. She wouldn&#x2019;t be writing about here breakfast, though...
...she&#x2019;d be writing about her research. And she&#x2019;d want to use precise terminology.
We can see that people are already using blogs as a convenient way to publish quite specific types of information.
I&#x2019;m not saying that blogging is the future, I&#x2019;m just using blogging as a shorthand for &#x2018;easy HTML publishing&#x2019;. If anyone can set up a blog, and we can get metadata into blogs, then it follows that anyone can publish metadata.
The key point. (Again.)
So we&#x2019;ve seen some of the problems that we&#x2019;re trying to address, now we&#x2019;ll look at how RDFa helps us to address them.
We saw earlier how on this site each blog post is a book review. We can see that the core values of this review are the title of the item being reviewed, the rating, and the comments.
This is what the review would look like, marked up with Google&#x2019;s review vocabulary in RDFa. We still have the title for the book, the rating and the summary, but now it&#x2019;s formatted in such a way that Google&#x2019;s crawlers can understand this as being more than just some text.
And a consequence for authors of surfacing this data will be improvements in the presentation on search engines; this is restaurant review that has been marked up using Google&#x2019;s extra features, and the results shown as a &#x2018;rich snippet&#x2019;, but there is no reason that it can&#x2019;t be a book or film.
Another example of how RDFa helps, is in the realm of linked data. Once the data on the page has been &#x2018;understood&#x2019;, then we can go off and find more information from the linked data cloud. We can do this because we have accurate identifiers in the page.
For example, if we add an identifier for the book to the book review, we can use it to go out to the linked data cloud and get the full book title, a picture of the book cover, and so on.
Retrieving book information from the linked data cloud.
Similarly, we can go off to the linked data cloud to get the latest Tweets by a person.
Retrieving Tweets from the linked data cloud.
Another consequence of having more precise information in the page, is vertical search.
If we search Google for &#x2018;benzene&#x2019; then we will get very general results, useful for the public (e.g., Wikipedia) but of no interest to a chemist.