The amount of content on the Web is growing. Content on its own is losing value. Web content producers have to offer „content plus something“ to be attractive. But what can this „something“ be?
As an answer to this question, We introduce „Internationalization Tag Set 2.0“, a standard defined by the World Web Web Consortium (W3C). ITS 2.0 helps you to make your content attractive for translators, search engine providers and many others.
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Felix Sasaki - Value beyond content creation - Introducing ITS 2.0; soapconf 2014
1. Sasaki – SOAP! 2014
Value Beyond Content Creation:
Introducing ITS 2.0
Felix Sasaki
DFKI / W3C Fellow
Slides at
http://www.w3.org/Talks/2014/1003-soap-sasaki.pdf
1
2. Sasaki – SOAP! 2014
If you want to have nice visualization
what ITS 2.0 is: go here
“Linguini a la translation:
An Introduction to ITS 2.0”
https://www.youtube.com/watch?v=5Goet3hX6Jo
2
3. Sasaki – SOAP! 2014
What content authors normally do
• Make money by creating
– Content
– Layout
– Apps
• More and more difficult
– Growing amount of content & apps
– What is the differentiator?
3
4. Sasaki – SOAP! 2014
What content authors may do
in the future
• Make money by enriching content
– Using automatic tools with manual correction
– Create the basis for further processes
• Translation, search engine optimization,
contextualization, personalization, ..
– Authors become content curators
• Background: R&D projects and their results
4
5. Sasaki – SOAP! 2014
Background 1: LIDER project
http://lider-project.eu/
• EU funded project – aims:
– Demonstrating the value of multilingual linguistic
linked data sources
– Exploring usage scenarios & requirements in
various domains
– Creating an R&D roadmap around the topic
5
6. Sasaki – SOAP! 2014
Background 2: ITS 2.0
http://www.w3.org/TR/its20/
• W3C standard to foster multilingual content
creation
• Defines metadata (“data categories”) to
support the multilingual content life cycle
• A way to interlink Web content and
multilingual linked data sources
6
7. Sasaki – SOAP! 2014
ITS 2.0 data categories
• Translate
• Localization Note
• Terminology
• Directionality
• Language Information
• Elements Within Text
• Domain
• Text Analysis
• Locale Filter
• Provenance
• External Resource
• Target Pointer
• ID Value
• Preserve Space
• Localization Quality Issue
• Localization Quality
Rating
• MT Confidence
• Allowed Characters
• Storage Size
7
8. Sasaki – SOAP! 2014
ITS 2.0: High level features
• Can be applied to general XML content and to
HTML5
• Partially natively supported in HTML5
– E.g. HTML5 “translate” attribute
• Applying data categories
– locally: ITS attributes in content
– globally: CSS like selector mechanism, using XPath
• Independent data categories: no need to support
(as tool maker or user) everything
8
9. Sasaki – SOAP! 2014
Example: “Translate”
local and global
9
<p>The <span translate=no>World Wide Web Consortium</span>
is making the World Wide Web worldwide!</p>
<its:rules
xmlns:its="http://www.w3.org/2005/11/its" version="2.0">
<its:translateRule selector="//h:code" translate="no"
xmlns:h="http://www.w3.org/1999/xhtml"/>
</its:rules>
10. Sasaki – SOAP! 2014
Example: “Localization Note”
10
<data
its:locNote="%1$s is the original text's date in the format YYYY-
MM-DD HH:MM always in GMT" …>
<value>Translated from English content dated <span id="version-
info">%1$s</span> GMT.</value>
</data>
11. Sasaki – SOAP! 2014
Example: “Elements within Text”
11
<text xmlns:its="http://www.w3.org/2005/11/its"
its:version="2.0">
<body>
<par>Text with <bold its:withinText="yes">bold</bold>.</par>
</body>
</text>
12. Sasaki – SOAP! 2014
Example: “Locale Filter”
12
<book xmlns:its="http://www.w3.org/2005/11/its">
<info>
<legalnotice its:localeFilterList="en-CA, fr-CA">
<para>This legal notice is only for English and French Canadian
locales.</para>
</legalnotice>
</info>
</book>
13. Sasaki – SOAP! 2014
Example: “Allowed Characters”
13
<p>Login names can only use letters from A to Z (upper or
lowercase)
and the character underscore (_) and minus (-).
For example:
<code its-allowed-characters=[a-zA-Z_-]>Huck_Finn</code>.</p>
14. Sasaki – SOAP! 2014
Example: “Terminology”
14
<p>And he said: you need a new
<quote its:term="yes"
its:termInfoRef
="http://www.directron.com/motherboards1.html"
its:termConfidence="0.5">motherboard</quote></p>
15. Sasaki – SOAP! 2014
Example: “MT Confidence”
15
<body its-annotators-ref="mt-confidence|file:///tools.xml#T1">
<p>
<span its-mt-confidence=0.8982>Dublin is the capital of
Ireland.</span>
16. Sasaki – SOAP! 2014
Example: “Provenance”
16
<p its-tool-ref="http://www.onlinemtex.com/2012/7/25/wsdl/"
its-org="acme-CAT-v2.3"
its-prov-
ref="http://www.examplelsp.com/excontent987/production/prov
/e6354"
its-rev-org="acme-CAT-v2.3"
>This paragraph was translated from the machine.</p>
17. Sasaki – SOAP! 2014
Example: “Localization Quality Issue”
17
<p>
<span
data-mytool-qacode=named_entity_not_found
its-loc-quality-issue-comment="Should be Thomas Cahill.”
its-loc-quality-issue-profile-ref=http://example.org/qaMovel/v1
its-loc-quality-issue-severity=100
its-loc-quality-issue-type=inconsistent-entities>
Christian Bale</span> (1867–1934) conceived of an instrument
… </p>
18. Sasaki – SOAP! 2014
Example: “Text Analysis”
• Identify concepts in content, like named
entities
– persons, places, events, …
• Store identifiers in (Web) content
• Provide a link to multilingual linked data
sources – a basis for content curation
18
19. Sasaki – SOAP! 2014
Example: “Text Analysis”
19
<p><span
its-ta-confidence="0.7"
its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location"
its-ta-ident-ref="http://dbpedia.org/resource/Dublin"
>Dublin</span> is the <span
its-ta-source="Wordnet3.0"
its-ta-ident="301467919"
its-ta-confidence="0.5"
>capital</span> of Ireland.</p>
20. Sasaki – SOAP! 2014
What content authors can do with multilingual
linked data sources and ITS 2.0
• Add value to content beyond the content itself
• Curate content: provide identifiers, context, cross
lingual information
• Tool examples:
1) Generation of ITS 2.0 “Text Analysis” for ePub, and
Schema.org markup
2) Generation of translation suggestions
3) Working with linked data in the browser – without
understanding details
20
21. Sasaki – SOAP! 2014
TOOLING 1): GENERATION OF ITS 2.0
“TEXT ANALYSIS” AND SCHEMA.ORG
MARKUP FOR EPUB
21
22. Sasaki – SOAP! 2014
Setup
• oXygen XML editor, modified for ePub /
XHTML5 author mode
• Input: ePub or XHTML5 documents
• Output: documents enriched with Schema.org
structured information
• User does information generation in a
WYSIYWG mode
22
23. Sasaki – SOAP! 2014
Process
1. Automatic generation of entity annotation,
using DBpedia spotlight, producing DBpedia
identifiers
2. Access to DBpedia information with pre-
defined linked data queries
3. Generation of Schema.org markup
23
24. Sasaki – SOAP! 2014
1. Automatic generation of entity
annotation
• Input:
<p>Welcome to Dublin in Ireland, the home of Samuel
Beckett.</p>
24
25. Sasaki – SOAP! 2014
1. Automatic generation of entity
annotation
• Output, stored with ITS 2.0 “Text Analysis”
markup:
<p>Welcome to <span
its-ta-ident-ref="http://dbpedia.org/resource/Dublin"
...>Dublin</span> in Ireland, the home of <span
its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett"
...>Samuel Beckett</span>.</p>
25
26. Sasaki – SOAP! 2014
2. Access to DBpedia information
• Using DBpedia identifiers from previous steps
in linked data query templates. Example query
(part of the query), checking whether entity is
a person:
SELECT ?birthPlace ... WHERE{
<http://dbpedia.org/resource/Samuel_Beckett>
rdf:type foaf:Person.
... }
26
27. Sasaki – SOAP! 2014
3. Generation of Schema.org
structured information
• Using output of previous step (query result)
• Generating Schema.org structured
information
– Taking types derived from DBpedia into account,
currently
• http://schema.org/Person
• http://schema.org/Place
27
28. Sasaki – SOAP! 2014
3. Generation of Schema.org
structured information
• Input: linked data query result and marked-up
document
<p>Welcome to <span
its-ta-ident-ref="http://dbpedia.org/resource/Dublin"
...>Dublin</span> in Ireland, the home of <span
its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett"
...>Samuel Beckett</span>.</p>
28
29. Sasaki – SOAP! 2014
3. Generation of Schema.org
structured information
• Output: marked-up document with
Schema.org structured information
<p>Welcome to <span ... itemscope=""
itemtype="http://schema.org/Place">
<a itemprop="url" href="
http://en.wikipedia.org/wiki/Dublin"><span itemprop="name"
>Dublin</span></a></span>…</p>
29
30. Sasaki – SOAP! 2014
3. Generation of Schema.org
structured information
• Output: auto-generating markup + text
<p>... Samuel Beckett ... (born in <span itemscope=""
itemtype="http://schema.org/Place">
<a itemprop="url"
href="http://en.wikipedia.org/wiki/Foxrock">
<span itemprop="name"
>Foxrock</span></a></span>)</p>
30
31. Sasaki – SOAP! 2014
Checking output with
Structured Data Testing Tool
31
32. Sasaki – SOAP! 2014
Broad review: a view of schema.org
types that may work well
Book (dbpedia-owl:Book)
City (dbpedia-owl:City)
Country (dbpedia-owl:Country)
Event (dbpedia-owl:Event)
Hotel (dbpedia-owl:Hotel)
Library (dbpedia-owl:Library)
Movie (dbpedia-owl:Film)
Person (foaf:Person)
Place (dbpedia-owl:Place)
Organization (dbpedia-owl:Organization)
32
34. Sasaki – SOAP! 2014
Generating translation suggestions
• Input: like before
• Steps:
1. Entity annotations (again)
2. Access to DBpedia and Wikidata to get
translation suggestions
3. Storing the results as a localization note
34
35. Sasaki – SOAP! 2014
1. Automatic generation of entity
annotation
• Output, stored with ITS 2.0 “Text Analysis”
markup:
<p>Welcome to <span
its-ta-ident-ref="http://dbpedia.org/resource/Dublin"
...>Dublin</span> in Ireland, the home of <span
its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett"
...>Samuel Beckett</span>.</p>
35
36. Sasaki – SOAP! 2014
2. Access to DBpedia and Wikidata to
get translation suggestions
• Get translation suggestion from Dbpedia
SELECT ?o WHERE {
<http://dbpedia.org/resource/Samuel_Beckett> rdfs:label ?o
}
36
37. Sasaki – SOAP! 2014
2. Access to DBpedia and Wikidata to
get translation suggestions
• Get translation suggestion from Wikidata
http://www.wikidata.org/w/api.php?action=
wbgetentities&
sites=itwiki&
titles=Samuel%20Beckett
37
38. Sasaki – SOAP! 2014
3. Storing the results as ITS 2.0
localization note
• Input: DBpedia + Wikidata query result and
marked-up document
<p>… the home of <span
its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett"
...>Samuel Beckett</span>.</p>
38
39. Sasaki – SOAP! 2014
3. Storing the results as localization
note
• Output: Translation suggestions stored as
localization note
<p>… the home of <span
its-ta-ident-ref="http://dbpedia.org/resource/Samuel_Beckett"
its-loc-note="
TRANSLATION SUGGESTIONS: 1) wikidata:サミュエル・ベケッ
ト
2) dbpedia:サミュエル・ベケット"
...>Samuel Beckett</span>.</p>
39
40. Sasaki – SOAP! 2014
TOOLING 3: WORKING WITH LINKED
DATA IN THE BROWSER – WITHOUT
UNDERSTANDING DETAILS
40
41. Sasaki – SOAP! 2014
MLOD4CON
• Working with links to external multilingual
data sources
• Under the hood: lot’ of technology
– ITS 2.0, RDF, SPARQL, JavaScript, …
• Good news: the user does not need to know
about these
Demo at
http://www.w3.org/People/fsasaki/mlod4con/
41
43. Sasaki – SOAP! 2014
Issues
• Learn from communities what they want to do
with ITS 2.0 and linked data sources
– Content creators and content architects,
translators, XML / Web tool makers, researchers in
the data and language technology area, …
• Provide adequate tooling
• Look carefully into requirements: “Too much
information is no information!”
43
44. Sasaki – SOAP! 2014
What next for you?
• ITS 2.0 Tooling
https://www.w3.org/International/its/wiki/ITS_Implementations
• Videos explaining ITS 2.0 usage
https://www.youtube.com/user/W3CITS20/videos
• Linked Data for Language Technology Community
Group: discuss use cases and requirements for
multilingual linked data
http://www.w3.org/community/ld4lt/
• ITS Interest Group: Join the community of ITS 2.0
users and implementers
https://www.w3.org/International/its/ig/
44
45. Sasaki – SOAP! 2014
Value Beyond Content Creation:
Introducing ITS 2.0
Felix Sasaki
DFKI / W3C Fellow
Slides at
http://www.w3.org/Talks/2014/1003-soap-sasaki.pdf
45