SlideShare una empresa de Scribd logo
1 de 95
Descargar para leer sin conexión
Georg Rehm
German Research Center for Artificial Intelligence (DFKI) GmbH
Annotation in scholarly editions and research
Bergische Universität Wuppertal – 21 February 2019
Observations on Annotations
From Computational Linguistics and the
World Wide Web to AI and back again
Observations on Annotations – Wuppertal, Germany, 21 February 2019 2
Annotation
Computational
Linguistics and AI
(since 1992)
SGML and TEI
(since 1995)
XML since 1998
XSLT
XPath
Several others ...
Corpus
annotation
formats
Hypertext and
Textlinguistics
Web
Technologies,
W3C, Markup
Languages
W3C Office
Germany/Austria
(since 2013)
AI and Language
Technology
Development
(since 2009)
Infrastructures
and Platforms
Service
Deployment
Research Data
Language
Resources
Metadata
Data FormatsOpen Science
Annotation:
Personal Background
Introduction
• Annotations have been playing an important role in
Computational Linguistics and related fields
(especially Digital Humanities) for decades.
• This talk: Recent examples, lessons learned and
some general observations on annotations.
• My own research in this area (since approx. 1996):
– from basic and applied research to
– innovation and technology development
Observations on Annotations – Wuppertal, Germany, 21 February 2019 3
Outline
• Annotations – brief definition
• World Wide Web
• Annotations and AI
• Annotations and Computational Linguistics
• Annotations and Language Technology
• Annotations for a Credible Web
• Annotations and Open Science
• Annotations and Markup
• Dimensions of Annotations
• Summary and Conclusions
Observations on Annotations – Wuppertal, Germany, 21 February 2019 4
Annotations:
a brief definition
Observations on Annotations – Wuppertal, Germany, 21 February 2019 5
Annotations
• Definition/“Definition”:
Secondary data added to a piece of primary data –
in science, this is, often, research data.
• Wikipedia:
An annotation is a metadatum (e.g., a post, explanation,
markup) attached to [a?] location or other data.
http://www.merriam-webster.com
Observations on Annotations – Wuppertal, Germany, 21 February 2019 6
• Literature and education:
– Textual scholarship: Textual scholarship is a discipline that often
uses the technique of annotation to describe or add additional
historical context to texts and physical documents.
– Learning and instruction: As part of guided noticing [annotation]
involves highlighting, naming or labelling and commenting
aspects of visual representations to help focus learners' attention
on specific visual aspects. In other words, it means the assignment
of typological representations (culturally meaningful categories),
to topological representations (e.g. images).
• Software engineering:
– Text documents: Markup languages like XML and HTML annotate
text in a way that is syntactically distinguishable from that text.
They can be used to add information about the desired visual
presentation, or machine-readable semantic information, as in
the semantic web.
• Linguistics:
– In linguistics, annotations include comments and metadata; these
non-transcriptional annotations are also non-linguistic.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 7
World Wide Web
Observations on Annotations – Wuppertal, Germany, 21 February 2019 8
Observations on Annotations – Wuppertal, Germany, 21 February 2019 9
Observations on Annotations – Wuppertal, Germany, 21 February 2019 10
Observations on Annotations – Wuppertal, Germany, 21 February 2019 11
“Vague but exciting”
Observations on Annotations – Wuppertal, Germany, 21 February 2019 12
Information Management: A Proposal
Tim Berners-Lee, CERN, March 1989, May 1990
“Private links
One must be able to
add one's own private
links to and from public
information. One must
also be able to annotate
links, as well as nodes,
privately.”
World Wide Web Consortium
• W3C is an international non-profit member-financed
standards developing organisation
• Founded in 1994 by Sir Tim Berners-Lee
• Currently 451 members – 23 in Germany/Austria
• Approx. 60 staff (ERCIM, MIT, UKeio, UBeihang)
• Approx. 20 offices in important regions
• The W3C Office Germany/Austria is run by
• Open Web Platform, HTML5, CSS, Credible Web, Digital
Publishing, Linked Data etc.
http://w3.org ! http://w3c.de
13
Interested in joining? Talk to me!
Relevant W3C Standards
• XML – Extensible Markup Language
– Extremely influential
– Widely adopted
– TEI and many other languages
• Semantic Web
– RDF, OWL, SPARQL, SKOS etc.
• Digital Publishing
– New versions of EPub
• Web Annotation Data Model and Vocabulary
Observations on Annotations – Wuppertal, Germany, 21 February 2019 14
https://www.w3.org/2001/10/03-sww-1/slide7-0.html
Web Annotation
Observations on Annotations – Wuppertal, Germany, 21 February 2019 15
Web Annotations
• Web Annotation – Three W3C Recommendations
• Most popular and relevant implementation: Hypothes.is
– Mission-driven, non-profit Open Source company
– Main focus on scholarly publishing
(“Annotating All Knowledge Coalition”)
– Very active and vibrant community
• Hypothes.is: main driving force
behind the I Annotate conference series
– Open proceedings, very interesting programme, diverse
speakers from several disciplines – consider attending!
– Videos of almost all previous events available online
Observations on Annotations – Wuppertal, Germany, 21 February 2019 16
• Web Annotation Data Model
Describes the underlying Annotation Abstract Data
Model as well as a JSON-LD serialization
• Web Annotation Vocabulary
The Vocabulary which underpins the
Web Annotation Data Model
• Web Annotation Protocol
The HTTP API for publishing, syndicating,
and distributing Web Annotations
• Published on 23 February 2017
Observations on Annotations – Wuppertal, Germany, 21 February 2019 17
Web Annotation Standard
Web Annotation Standard
• What does this mean for end users?
– Annotation: a set of connected resources, typically incl. a
body and target – the body is related to the target.
– No more comment widgets and silos!
– Annotation capability can be built natively into the browser
– Conversations can take place anywhere on the web and in
a standards-based way
• Why is this different?
– Annotations can live separately from documents and are
reunited and re-anchored in real-time
– Annotations are under the control of the user
– Users can form communities (across HTML, PDF etc.)
Observations on Annotations – Wuppertal, Germany, 21 February 2019 18
Observations on Annotations – Wuppertal, Germany, 21 February 2019 19
Hypothes.is Statistics
Observations on Annotations – Wuppertal, Germany, 21 February 2019 20
December 2018: 4.4 Million Annotations and Counting
260K
In groups, private
In groups, shared
Private
Public
JAN
2015
JAN
2016
JAN
2017
JAN
2018
DEC
2018
20K
40K
60K
80K
100K
120K
140K
160K
180K
200K
220K
240K
The Hypothes.is Tool
Observations on Annotations – Wuppertal, Germany, 21 February 2019 21
! Private Notes
! Public annotations
! Collaboration groups
! Linked Data connections
! Cross format:
○ HTML
○ PDF
○ EPUB
○ Data
! Community driven
! Open Source
Open Groups
Observations on Annotations – Wuppertal, Germany, 21 February 2019 22
Errata and Corrections
Observations on Annotations – Wuppertal, Germany, 21 February 2019 23
Observations on Annotations – Wuppertal, Germany, 21 February 2019 24
ADA: American Diabetes Association
● Wanted a way to update content
and add information links
● Needed to restrict use to ADA staff
Peer Review
Observations on Annotations – Wuppertal, Germany, 21 February 2019 25
Automated Annotation
Observations on Annotations – Wuppertal, Germany, 21 February 2019 26
Automated systems can
tag elements such as
RRIDs (Research
Resource Identifiers) and
other scholarly identifiers
or entities, allowing
navigation to background
information and powerful
search queries through
other papers mentioning
the same entity.
User Profiles
Observations on Annotations – Wuppertal, Germany, 21 February 2019 27
Use anywhere on the web
Observations on Annotations – Wuppertal, Germany, 21 February 2019 28
Annotations and AI
Observations on Annotations – Wuppertal, Germany, 21 February 2019 29
Observations on Annotations – Wuppertal, Germany, 21 February 2019 30
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Data Intelligence
Current breakthroughs based on Machine Learning (“Deep Learning”)
Also still in use: symbolic, rule-based methods and expert systems
Artificial Intelligence
Huge data sets + powerful learning algorithms + very fast hardware
31
Annotations and AI
• Modern AI is data-driven – supervised learning relies
on annotated data sets.
• However, certain AI algorithms can learn structure and
patterns without any annotations whatsoever.
• The relevance of annotations has increased dramatically.
• This is especially true for very large annotated data sets.
• Many consist of primary data and secondary annotations.
• Companies have emerged that produce annotated data
sets using crowd-workers (e.g., Figure Eight, Crowdee)
• Key question: how detailed, relevant, correct, meaningful
and reliable are these annotations really?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 32
Annotations and Events
• Likes and Favs (user-driven annotation, action)
• Five-star ratings (user-driven annotation, action)
• Online comments (user-driven annotation, action)
• Online reviews (user-driven annotation, action)
• Clicking an article headline/link (user-initiated event, action)
• Reading an ebook (user-initiated event, action)
– Page turns in ebooks are measured – when slow: “boredom”, “disinterest”
– Next time in the ebook store you’re getting adjusted recommendations
• No longer reading an ebook (user-initiated event, non-action)
– Boring chapters where people throw in the towel can be easily identified
– (Brave new) future: use automatic paraphrasing to re-write the chapter
– Or maybe NLG and A/B tests – then it’s the original author vs. the machine
Observations on Annotations – Wuppertal, Germany, 21 February 2019 33
Annotations in
Computational
Linguistics
Observations on Annotations – Wuppertal, Germany, 21 February 2019 34
Annotations in CL
• Diverse and specialised tool landscape
http://annotation.exmaralda.org/index.php?title=Linguistic_Annotation
• Diverse and specialised format landscape:
TEI, NIF, NAF, LAF, TIGER, STTS, FoLiA
and many, many others
• From trivial annotation schemes to extremely complex
• From low inter-annotator agreement scores to high ones
• From flexible tools to highly specialised tools
• From very high quality annotations to very low ones
• A brief look at a few tools …
Observations on Annotations – Wuppertal, Germany, 21 February 2019 35
Observations on Annotations – Wuppertal, Germany, 21 February 2019 36
Exmaralda
Observations on Annotations – Wuppertal, Germany, 21 February 2019 37
Praat
Observations on Annotations – Wuppertal, Germany, 21 February 2019 38
ELAN
Observations on Annotations – Wuppertal, Germany, 21 February 2019 39
brat
Observations on Annotations – Wuppertal, Germany, 21 February 2019 40
WebAnno
Observations on Annotations – Wuppertal, Germany, 21 February 2019 41
Annis
Annotations in
Language Technology
Observations on Annotations – Wuppertal, Germany, 21 February 2019 42
Language Technology
• Language Technology transfers theoretical results from
language-oriented research into technologies and
applications that are ready for production use.
• Uses results from, e.g.:
– Artificial Intelligence
– Computer Science
– Computational Linguistics
– Natural Language Processing
– Psychology, Psycholinguistics
– Cognitive Science
Observations on Annotations – Wuppertal, Germany, 21 February 2019 43
Example Applications
• Spell checkers
• Dictation systems
• Translation systems
• Search engines
• Report generation
• Expert systems
• Dialogue systems
• Text summarisers
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
The relationship between
Web Annotations
and Language Technology
on a rather general level.
44
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
Content could be created by Language
Technology fully automatically or in a
semi-automatic way (text generation)
45
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
Content could be analysed by
Language Technology (semantic
analysis, input for ML algorithms etc.)
46
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
Especially in Social Media Analytics
we are interested in UGC, i.e., in
comments, feedback – “what do
users think of a certain product?“.
47
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
• Analysing UGC is difficult and
costly (many heterogeneous
sources, many different formats)
• A few established and widely used
Web Annotation services would
simplify SMA dramatically!
48
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Web Annotation Architecture
We can also use LT methods to
create or help create annotations,
e.g., in smart authoring scenarios.
49
LT and Web Annotations
• Analysis of web annotations and exploiting web
annotations through Language Technology:
– Arbitrary web annotations (i.e., unstructured text)
• No more crawling, aggregating, mapping!
– Dedicated LT-specific web annotations
• Annotating language data without any specialised
stand-alone tools or data repositories!
• Generation of web annotations through Language
Technology (e.g., to provide background information on
important content). Example: Content semantification.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 50
Platform for digital Curation Technologies
Broker REST API
Curation Service 1
Curation Service 2
Client uses
the API
External
Service 1
External
Service 2
Client uses
the API
Client uses
the API
Curation Workflow
Input
Output
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
<http://link.omitted/documents/document1#char=0,26>
a nif:RFC5147String , nif:String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "26"^^xsd:nonNegativeInteger ;
nif:isString "Welcome to Berlin in 2016. "^^xsd:string ;
dfkinif:averageLatitude "52.516666666666666"^^xsd:double ;
dfkinif:averageLongitude "13.383333333333333"^^xsd:double ;
dfkinif:stdDevLatitude "0.0"^^xsd:double ;
dfkinif:stdDevLongitude "0.0"^^xsd:double ;
nif:meanDateRange "20160101010000_20170101010000"^^xsd:string .
<http://link.omitted/documents/document1#char=21,25>
a nif:RFC5147String , nif:String ;
itsrdf:taIdentRef <http://link.omitted/ontologies/nif#date=20160101000000_20170101000000> ;
nif:anchorOf "2016"^^xsd:string ;
nif:beginIndex "21"^^xsd:nonNegativeInteger ;
nif:endIndex "25"^^xsd:nonNegativeInteger ;
nif:entity <http://link.omitted/ontologies/nif#date>.
<http://link.omitted/documents/#char=11,17>
a nif:RFC5147String , nif:String ;
nif:anchorOf "Berlin"^^xsd:string ;
nif:beginIndex "11"^^xsd:nonNegativeInteger ;
nif:endIndex "17"^^xsd:nonNegativeInteger ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Location> ;
nif:referenceContext <http://link.omitted/documents/#char=0,26> ;
geo:lat "52.516666666666666"^^xsd:double ;
geo:long "13.383333333333333"^^xsd:double ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Berlin> .
NLP Interchange
Format (NIF)
“Welcome to Berlin in 2016.”
• RDF/OWL-basiertes Format für NLP-
Anwendungen
• Ermöglicht Interoperabilität
• Durch pures RDF „natürliche“
Integration von Linked-Data-Daten
• Entwickelt von der Universität Leipzig
• Plattform unterstützt neben NIF auch
Web Annotations
Digital Curation Technologies:
Prototypically implemented Platform and Services
Peter Bourgonje, Julian Moreno-Schneider, Jan Nehring, Georg Rehm, Felix Sasaki, and Ankit Srivastava.
“Towards a Platform for Curation Technologies: Enriching Text Collections with a Semantic-Web Layer.” In
Harald Sack, Giuseppe Rizzo, Nadine Steinmetz, Dunja Mladenić, Sören Auer, and Christoph Lange,
editors, The Semantic Web, number 9989 in LNCS, pages 65-68. Springer, June 2016. ESWC 2016
Satellite Events. Heraklion, Crete, Greece, May 29 - June 2, 2016 Revised Selected Papers.
Client uses
the API
52
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
<http://link.omitted/documents/document1#char=0,26>
a nif:RFC5147String , nif:String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "26"^^xsd:nonNegativeInteger ;
nif:isString "Welcome to Berlin in 2019. "^^xsd:string ;
dfkinif:averageLatitude "52.516666666666666"^^xsd:double ;
dfkinif:averageLongitude "13.383333333333333"^^xsd:double ;
dfkinif:stdDevLatitude "0.0"^^xsd:double ;
dfkinif:stdDevLongitude "0.0"^^xsd:double ;
nif:meanDateRange "20190101010000_20200101010000"^^xsd:string .
<http://link.omitted/documents/document1#char=21,25>
a nif:RFC5147String , nif:String ;
itsrdf:taIdentRef <http://link.omitted/ontologies/nif#date=20190101000000_20200101000000> ;
nif:anchorOf "2019"^^xsd:string ;
nif:beginIndex "21"^^xsd:nonNegativeInteger ;
nif:endIndex "25"^^xsd:nonNegativeInteger ;
nif:entity <http://link.omitted/ontologies/nif#date>.
<http://link.omitted/documents/#char=11,17>
a nif:RFC5147String , nif:String ;
nif:anchorOf "Berlin"^^xsd:string ;
nif:beginIndex "11"^^xsd:nonNegativeInteger ;
nif:endIndex "17"^^xsd:nonNegativeInteger ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Location> ;
nif:referenceContext <http://link.omitted/documents/#char=0,26> ;
geo:lat "52.516666666666666"^^xsd:double ;
geo:long "13.383333333333333"^^xsd:double ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Berlin> .
NLP Interchange
Format (NIF)
“Welcome to Berlin in 2019.”
• RDF/OWL-based format for NLP
applications
• Enables interoperability
• Pure RDF and, hence, natural
integration of Linked Data data
• Developed by Universität Leipzig
• Our platform also supports Web
Annotation data model
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Julian Moreno-Schneider, Ankit Srivastava, Peter Bourgonje, David Wabnitz, and Georg Rehm. “Semantic Storytelling, Cross-
lingual Event Detection and other Semantic Services for a Newsroom Content Curation Dashboard.” In Octavian Popescu and
Carlo Strapparava, editors, Proceedings of Natural Language Processing meets Journalism - EMNLP 2017 Workshop (NLPMJ
2017), Copenhagen, Denmark, September 2017. 7. September.
Sector: Journalism
53
Observations on Annotations – Wuppertal, Germany, 21 February 2019
Sector: TV, Web-TV, Media
54
Georg Rehm, Julián Moreno Schneider, Peter Bourgonje, Ankit Srivastava, Rolf Fricke, Jan Thomsen, Jing He, Joachim Quantz, Armin Berger, Luca König, Sören
Räuchle, Jens Gerth, and David Wabnitz. “Different Types of Automated and Semi-Automated Semantic Storytelling: Curation Technologies for Different Sectors”.
In Georg Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital Age: 27th International Conference, GSCL 2017, Berlin,
Germany, September 13-14, 2017, Proceedings, number 10713 in Lecture Notes in Artificial Intelligence (LNAI), pages 232-247, Cham, Switzerland, January 2018.
Gesellschaft für Sprachtechnologie und Computerlinguistik e.V., Springer. 13/14 September 2017.
Annotations for a
Credible Web
Observations on Annotations – Wuppertal, Germany, 21 February 2019 55
Observations on Annotations – Wuppertal, Germany, 21 February 2019 56
Observations on Annotations – Wuppertal, Germany, 21 February 2019 57
Viral Content and Filter Bubbles
• Content is often published without checking its validity,
discovered through social media and, if it appears
relevant, shared immediately.
• Content is often shared without reading it.
• Goal: virality ➟ reach ➟ clicks ➟ ad revenue
• Not all “journalistic” content (or publishing outlets) is really
committed to reporting the facts.
• Nowadays the burden of fact-checking is with the readers.
• „Fake news“: label for several classes of online content.
• Can we balance out filter bubble and network effects?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 58
Georg Rehm. “An Infrastructure for Empowering Internet Users to handle Fake News and other Online Media Phenomena”. In Georg
Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital Age: Proceedings of the GSCL
Conference 2017, Berlin, September 2017. Gesellschaft für Sprachtechnologie und Computerlinguistik e.V. 13.-15. September 2017.
Seven classes
of false news
Satire or
parody
Wrong
connection
or relation:
when title
and photos
don‘t
support the
content
Misleading
content:
use of
information
to put
someone
or
something
in a bad
light
Wrong
context:
when
genuine
content is
presented
in the
wrong
context
Deceiving
content:
imitation of
real
sources
Bad
content
with a clear
purpose to
deceive
Fabricated
content:
completely
untrue,
produced
to deceive
Characteristics
Clickbait X X ? ? ?
Disinformation X X X X
Political bias ? X ? ? X
Bad journalism X X X
Publisher‘sintention
Parody X ? ?
Provocation X X X
Profit ? X X X
Deception X X X X X X
Influence politics X X X X
Influence politics X X X X X
Different classes of false news and their individual characteristics and intentions
(based on Wardle, 2017; Walbrühl, 2017; Rubin et al., 2015; Holan, 2016; Weedon et al., 2017)
59
Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
• Infrastructure as a native part of the web
• Necessary for that: support and buy-in from all
browser vendors, media publishers and standards
• All users need immediate access
Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
Tools analyse
automatically
Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
• Automatic results and free text
annotations are stored as Web
Annotations.
• Users make their annotations
available to one another.
Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
• Automatic analysis of free text
annotations (NLP, IE, RE etc.).
• Extraction of opinions, arguments,
claims, statements etc.
Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
UGM
• Standardised metadata schemas for efficient annotations,
e.g. “content is intentionally deceptive.”
• W3C Provenance Ontology, Schema.org (ClaimReview).
• To be used by the human and the machine
Website
with content
Tool1
Browser has native support for the infrastructure and
aggregates the different scores, messages and values
into messages or warnings regarding the content
Web
Annotations
DB1
Web
Annotations
DB2
Tool3
Tool2
UGA: User-generated annotations (free text)
UGM: User-generated metadata (standardised)
MGM: Machine-generated Metadata (standardised)
MGM
MGM
MGM
Decentral filters process content automatically and send
results to the browser (important: multilingualism)
UGA
Web
Annotations
DB4UGM
Example: user rates the
content quality regarding
a standardised schema
other users‘ annotations
Other
users
Web
Annotations
DB3
UGA
UGM
UGM
UGA
Decentral repositories
store all annotations
Detection of
hate speech Classify content for its
political spectrum
Fact checker
UGM
Goal: provide technologies to the user, with which
they can consume, assess, analyse, verify and
process digital content and media in a better way and
that indicate which contents may be problematic.
Web Annotation + Fake News
• Crowd-sourced Web Annotation content in combination
with a set of automatic analysis tools has enormous
potential to tackle online misinformation campaigns.
• Big impact if deployed widely and implemented correctly.
• However, there’s a danger to shift the point of attack that
misinformation campaigns exploit (to annotations).
• The Credibility Coalition has developed a similar
approach in parallel, see, e.g.,
https://web.hypothes.is/blog/annotation-powered-questionnaires/
Observations on Annotations – Wuppertal, Germany, 21 February 2019 67
Annotations and
Open Science
Observations on Annotations – Wuppertal, Germany, 21 February 2019 68
Open Science
• Movement to make scientific research, data
and dissemination accessible to all levels of
an inquiring society, amateur or professional.
• Encompasses practices such as
publishing open research, campaigning
for open access, encouraging scientists to
practice open notebook science, and
generally making it easier to publish and
communicate scientific knowledge.
• Connection to: annotations, research data
(corpora, LRs), semantics, knowledge,
linked data, repositories and other topics.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 69
https://en.wikipedia.org/wiki/Open_science
Observations on Annotations – Wuppertal, Germany, 21 February 2019 70
Open Science Taxonomy
https://en.wikipedia.org/wiki/Open_science
Observations on Annotations – Wuppertal, Germany, 21 February 2019 71
Open Science Taxonomy
https://en.wikipedia.org/wiki/Open_science
Annotations & Open Science
• Open Science will soon become the norm and goal in
data-intensive science
• Important aspects: interoperability, reproducibility, open
documentation of experiments, use of standards etc.
• Trend: open tools, open workflows, open data sets
• Annotations are an important and crucial piece of the
puzzle, especially documented, meaningful annotations
• Relevant initiatives: NFDI, EOSC
• Relevant principle: FAIR
Observations on Annotations – Wuppertal, Germany, 21 February 2019 72
FAIR Principles
• TO BE FINDABLE:
– F1 (meta)data are assigned a globally unique and eternally persistent identifier.
– F2 data are described with rich metadata.
– F3 (meta)data are registered or indexed in a searchable resource.
– F4 metadata specify the data identifier.
• TO BE ACCESSIBLE:
– A1 (meta)data are retrievable by their identifier using a standardized protocol.
– A1.1 the protocol is open, free, and universally implementable.
– A1.2 the protocol allows for an authentication and authorization procedure.
– A2 metadata are accessible, even when the data are no longer available.
• TO BE INTEROPERABLE:
– I1. (meta)data use a formal, accessible, shared, and broadly applicable language for
knowledge representation.
– I2. (meta)data use vocabularies that follow FAIR principles.
– I3. (meta)data include qualified references to other (meta)data.
• TO BE RE-USABLE:
– R1. meta(data) have a plurality of accurate and relevant attributes.
– R1.1 (meta)data are released with a clear and accessible data usage license.
– R1.2 (meta)data are associated with their provenance.
– R1.3 (meta)data meet domain-relevant community standards.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 73
Open Science and … Science
• Open Science approaches recommend the use of standards
• Only standardised data and metadata are truly interoperable
• BUT fundamental research is about inventing NEW things
• This contradicts the use of standards as the consensus that
was reached within a specific community
• However, it does NOT contradict the use of established tools
and best practice approaches
• Neither does it contradict the modification of standards
• At the end of the day, it’s about semantics & documentation
• If an established, standardised approach does not work for a
new piece of research, invent a new approach or get creative!
Observations on Annotations – Wuppertal, Germany, 21 February 2019 74
Annotation of Documents
• Open Science will be transforming research, making it
more sustainable, more visible, more transparent
• Substantially improved digital infrastructures
• This will, soon, include the annotation of documents,
starting with scientific publications (Web Annotation)
• First steps towards Open Peer Review (cf. arxiv.org)
• Trend: micro-publications (esp. for incremental research)
• Will the scientific paper continue to be the atomic unit?
• Important relevant initiative: ORKG
Observations on Annotations – Wuppertal, Germany, 21 February 2019 75
ORKG
• Vision driven forward by Sören Auer (TIB Hannover)
• Exchange of scholarly knowledge is primarily
document-based: researchers produce articles (online
or offline) as coarse-grained text documents.
• Transform this predominant paradigm into knowledge-
based information flows by representing and expressing
knowledge through semantically rich, interlinked graphs.
• Sören Auer et al. (2018): “Towards an Open Research
Knowledge Graph“.
https://doi.org/10.5281/zenodo.1157185
Observations on Annotations – Wuppertal, Germany, 21 February 2019 76
Interlinking of Concepts
Observations on Annotations – Wuppertal, Germany, 21 February 2019 77
ated procedures alone do not achieve the necessary coverage and accuracy; fully manual
n is too time-consuming; librarians lack the necessary domain-specific expertise; and scientists
e necessary expertise in knowledge representation. By combining the four strategies in a
ngful way, they can bring their respective strengths to bear and compensate for the weak points.
Interlinking of interdisciplinary and subject-specific concepts and artefacts of scientific work in the
different domains (here: TIB subject areas).
Open Research Knowledge Graph (ORKG) provides interlinking, integration, visualization,
ation, and search functions. It enables scientists to gain a much faster overview of new
pments in a specific field and identify relevant research problems. It represents the evolution of
entific discourse in the individual disciplines and enables scientists to make their work more
ible to colleagues and potential users in industry through semantic description. Figure 3 depicts a
ch contribution represented in simplified form by a knowledge graph.
technical ecosystem for knowledge-based science communication. ​The ORKG service is
Auer et al. (2018)
Linked Open Data Cloud
Semantic Web
Standards
Persistent Identifiers
GND European
Open Science Cloud
Annotations and Markup
Observations on Annotations – Wuppertal, Germany, 21 February 2019 78
Annotations and Markup
• Complex topic – we can only scratch the surface
• XML is – unfortunately – considered “done” within W3C,
all senior XML specialists have left the organisation.
• https://www.balisage.net/Proceedings/vol21/html/Tovey0
1/BalisageVol21-Tovey01.html
– Discussion on the trend from declarative to procedural (!)
markup – there’s stagnation in the markup world.
• Relevant and timely: https://markupdeclaration.org
• Markup is not dead – there’s a small but active and
passionate community.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 79
Dimensions of
Annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 80
Annotations
• Annotation – Definition:
Secondary data added to a piece of primary data –
in science, this is, often, research data.
• The secondary data is, typically, a property of part of the
primary research data.
• Let’s examine this a bit more closely.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 81
Annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 82
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Property
Label of
property
Value of
property
Pointer to
annotation schema
Annotation schema
(possibly external)
may constrain
or restrict
Examples: lemma,
part of speech,
instance-of etc.
• What is the conceptual
nature of this property? Is it
best practice in research or
can it be entirely made up?
• How many colleagues in
the community agree on it?
• Is the label adequate and
self-explanatory?
Text
Annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 83
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Property
Label of
property
Value of
property
Pointer to
annotation schema
Annotation schema
(possibly external)
may constrain
or restrict
Examples: adjective,
JJ, object, “some free
text comment” etc.
• The actual annotation payload
• Is the value free text or taken from a
shared vocabulary?
• Is the shared vocabulary prescribed by
an annotation schema or ontology?
• How many colleagues in the community
agree on the value?
• How many colleagues in the community
agree on the shared vocabulary?
Text
Annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 84
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Property
Label of
property
Value of
property
Pointer to
annotation schema
Annotation schema
(possibly external)
may constrain
or restrict
Text
• Is there structure among the different properties?
• Markup languages, markup grammars
• Syntactic structure
– Ex.: “HVBXJ” => “AHXB”, “HKVZ”
• Semantic, i.e., logical structure
– Ex.: “NP” => “DET”, “N”
Many annotations
Annotating Annotations
Annotations on annotations (just a few selected points)
• Source (machine vs. single human vs. crowd-sourced)
• Application scenario: annotations for human vs. machine consumption
• Purpose or scope of the annotation (e.g., document structure, layout or
style, semantics, rhetorical structure, linguistic properties etc.)
– Can the structure be made explicit by the annotation format,
maybe via a markup language’s grammar?
– Can structure be made explicit through an ontology
that is put on top of the individual properties?
• Confidence value
• Quality indicator (0..1)
• Time added, time modified (timestamp)
• Style information – how annotations are rendered
• Annotation layers – one or multiple layers, independent or interrelated?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 85
Evaluation of Annotations
• Measuring inter-annotator agreement
• Measuring intra-annotator agreement – what if the same
person does the same annotation task again after a
week or a month?
• Test replicability and reproducibility
• Important exercise for:
– Emerging annotation formats
– Complex annotation exercises
– Measuring consensus
– Making sure that terms and labels are meaningful
Observations on Annotations – Wuppertal, Germany, 21 February 2019 86
Complexity of Annotations
• In (Computational) Linguistics we’ve designed some
fairly detailed annotation formats in the last 30 years.
• In contrast, many modern data sets (especially for data-
driven AI approaches in NLP) are quite shallow.
• AI classifiers need enormous amounts of data and just a
few high-level labels.
• It’s not feasible and too expensive to annotate data with
complex and sophisticated annotation formats.
• Is NLP/AI research forgetting annotation principles?
• Are we dumbing down linguistics to the simple
annotation of trivial labels?
• Has annotation research perhaps become obsolete?
Observations on Annotations – Wuppertal, Germany, 21 February 2019 87
• Example: GermEval 2018 data set
Tweet label, tweet label, tweet label etc.
• There is no structure, no concretisation, no hierarchical
information, no additional metadata
• Two observations:
– there’s a trend towards simply more annotations, i.e.,
increased quantity while ignoring quality, complexity and
structure – complex annotations are expensive and difficult
to generalise from.
– there’s a trend towards dumb annotations, which are
often crowd-sourced – it’s easier to generalise from simple
than from structured, hierarchical annotations.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 88
Complexity of Annotations
Summary and
Conclusions
Observations on Annotations – Wuppertal, Germany, 21 February 2019 89
Summary
• Annotations: from trivial to very complex
• From experimental to highly (de facto) standardised
• Annotations of annotations
• Multi-layer annotations – independent or interrelated
• Interoperability and reusability through standards
• But: standards vs. flexibility – basic science vs. applied
• Nowadays, annotations usually happen in the web
• Powerful stack of W3C technologies:
Web Annotation, Semantic Web, Linked Data, XML
• Web-scale annotations for scholarly publishing
• Annotations for Open Science
Observations on Annotations – Wuppertal, Germany, 21 February 2019 90
Summary
• Language Technology …
• … to automate the generation of annotations
– Semantification of journalistic/media content
– Semantification of scientific content
• … to automate the analysis of annotations
– Annotations for Open Science
• … to restore credibility and trust in the media
• In AI, annotations in data sets are often trivial
– Trend towards simply more and more annotations
– Trend towards more and more simple annotations
Observations on Annotations – Wuppertal, Germany, 21 February 2019 91
Annotating Annotations
• Different Dimensions of Annotations
• Is it possible to tie all dimensions together in a compact,
machine-readable way to describe and document an
annotation project?
– Complexity
– Semantics
– Source
– Impact
– Standard
– Research Question
– Methodology
– …
Observations on Annotations – Wuppertal, Germany, 21 February 2019 92
• Relevant for Open Science
• Relevant for interoperability
• Relevant for search & retrieval
• Relevant for reproducibility
• Relevant for evaluation
• Relevant for documentation & repos
• Relevant for good scientific practice
• … but maybe this is all too complicated
because a scientific paper already
does the trick in an established way?
Four Quadrant Diagram
Observations on Annotations – Wuppertal, Germany, 21 February 2019 93
Basic
research
Applications
and solutions
Humanities research
Computer Science and ICT research
X
• No need for
standardisation
• No need to use
standards
X
Clear need to use standards
for maximum adoption
X
• Avantgarde formats
• Weird phenomena
• Weird needs
• Expressibility
X
• Performance
• Standards
• Interoperability
Number of users:
rather small
Number of users:
rather high
XAI
X
• Markup
• Formal languages
• Querying
• Overlap
X
Digital
Humanities
This diagram is
work in progress.
Thank you!
Dr. Georg Rehm
Principal Researcher and Research Fellow
Speech and Language Technology Lab
DFKI, Berlin, Germany
! georg.rehm@dfki.de
! http://georg-re.hm
! http://de.linkedin.com/in/georgrehm
! https://www.slideshare.net/georgrehm
With many thanks to (in alphabetical order):
• Ivan Herman (W3C, The Netherlands)
• Heather Staines, Jon Udell, Dan Whaley (Hypothes.is, USA)
Observations on Annotations – Wuppertal, Germany, 21 February 2019 94
• Georg Rehm, Julian Moreno Schneider, and Peter Bourgonje. Automatic and Manual Web Annotations in an
Infrastructure to handle Fake News and other Online Media Phenomena. In Nicoletta Calzolari, Khalid Choukri,
Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani,
Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, editors, Proceedings of the
11th Language Resources and Evaluation Conference (LREC 2018), pages 2416-2422, Miyazaki, Japan, May
2018. European Language Resources Association (ELRA).
• Georg Rehm. An Infrastructure for Empowering Internet Users to handle Fake News and other Online Media
Phenomena. In Georg Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital
Age: 27th International Conference, GSCL 2017, Berlin, Germany, September 13-14, 2017, Proceedings, number
10713 in Lecture Notes in Artificial Intelligence (LNAI), pages 216-231, Cham, Switzerland, January 2018.
Gesellschaft für Sprachtechnologie und Computerlinguistik e.V., Springer. 13/14 September 2017.
• Georg Rehm. The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and
Distributing Language Resources. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck,
Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors,
Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016), pages 2450-2454,
Portorož, Slovenia, May 2016. European Language Resources Association (ELRA).
• Georg Rehm. Texttechnologische Grundlagen. In Kai-Uwe Carstensen, Christian Ebert, Cornelia Endriss, Susanne
Jekat, Ralf Klabunde, and Hagen Langer, editors, Computerlinguistik und Sprachtechnologie - Eine Einführung,
pages 159-168. Spektrum, Heidelberg, 3 edition, 2010.
• Georg Rehm, Richard Eckart, Christian Chiarcos, and Johannes Dellert. Ontology-Based XQuery'ing of XML-
Encoded Language Resources on Multiple Annotation Layers. In Nicoletta Calzolari (Conference Chair), Khalid
Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, editors, Proc. of the 6th
Language Resources and Evaluation Conference (LREC 2008), pages 525-532, Marrakesh, Morocco, May 2008.
• Georg Rehm, Andreas Witt, Erhard Hinrichs, and Marga Reis. Sustainability of Annotated Resources in Linguistics.
In Lisa Lena Opas-Hänninen, Mikko Jokelainen, Ilkka Juuso, and Tapio Seppänen, editors, Digital Humanities 2008,
pages 21-29, Oulu, Finland, June 2008. ACH, ALLC.
• Andreas Witt, Georg Rehm, Timm Lehmberg, and Erhard Hinrichs. Mapping Multi-Rooted Trees from a Sustainable
Exchange Format to TEI Feature Structures. In TEI@20: 20 Years of Supporting the Digital Humanities. The 20th
Anniversary TEI Consortium Members' Meeting, University of Maryland, College Park, October 2007.
• Andreas Witt, Oliver Schonefeld, Georg Rehm, Jonathan Khoo, and Kilian Evang. On the Lossless Transformation
of Single-File, Multi-Layer Annotations into Multi-Rooted Trees. In B. Tommie Usdin, editor, Proceedings of Extreme
Markup Languages 2007, Montréal, Canada, August 2007.
• Kai Wörner, Andreas Witt, Georg Rehm, and Stefanie Dipper. Modelling Linguistic Data Structures. In B. Tommie
Usdin, editor, Proceedings of Extreme Markup Languages 2006, Montréal, Canada, August 2006.
Observations on Annotations – Wuppertal, Germany, 21 February 2019 95

Más contenido relacionado

Similar a Observations on Annotations – From Computational Linguistics and the World Wide Web to Artificial Intelligence and back again

Excitement introduction
Excitement introductionExcitement introduction
Excitement introductionPnina Lorman
 
slides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdfslides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdfPetr Korab
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware ac.uk
 
Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Georg Rehm
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...Open Access Repositories & Interoperable Usage Statistics: Current Developmen...
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...uherb
 
OLE Project Webinr - Conversation with CUFTS April 8 2009
OLE Project Webinr - Conversation with CUFTS April 8 2009OLE Project Webinr - Conversation with CUFTS April 8 2009
OLE Project Webinr - Conversation with CUFTS April 8 2009John Little
 
META-NET and META-SHARE: An Overview
META-NET and META-SHARE: An OverviewMETA-NET and META-SHARE: An Overview
META-NET and META-SHARE: An OverviewGeorg Rehm
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
 
Semantic Web in the Plateau of Productivity
Semantic Web in the Plateau of ProductivitySemantic Web in the Plateau of Productivity
Semantic Web in the Plateau of ProductivityIoannis Stavrakantonakis
 
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...Tim Highfield
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticsCornelius Puschmann
 
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Bernhard Rieder
 
Strategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher EducationStrategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher EducationPeter Baumgartner
 
Strategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher EducationStrategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher EducationEPNET-Europortfolio
 
Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11Yleisradio
 
Monitoring the transformation of a domain-specific portal into a social infor...
Monitoring the transformation of a domain-specific portal into a social infor...Monitoring the transformation of a domain-specific portal into a social infor...
Monitoring the transformation of a domain-specific portal into a social infor...Ramón OVELAR
 

Similar a Observations on Annotations – From Computational Linguistics and the World Wide Web to Artificial Intelligence and back again (20)

Excitement introduction
Excitement introductionExcitement introduction
Excitement introduction
 
slides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdfslides_ZU_Text_mining_final (MEDIUM).pdf
slides_ZU_Text_mining_final (MEDIUM).pdf
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers
 
ENP Belgrade WS Metadata
ENP Belgrade WS MetadataENP Belgrade WS Metadata
ENP Belgrade WS Metadata
 
Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...Open Access Repositories & Interoperable Usage Statistics: Current Developmen...
Open Access Repositories & Interoperable Usage Statistics: Current Developmen...
 
Kaindl "Managing Information Overload"
Kaindl "Managing Information Overload"Kaindl "Managing Information Overload"
Kaindl "Managing Information Overload"
 
OLE Project Webinr - Conversation with CUFTS April 8 2009
OLE Project Webinr - Conversation with CUFTS April 8 2009OLE Project Webinr - Conversation with CUFTS April 8 2009
OLE Project Webinr - Conversation with CUFTS April 8 2009
 
META-NET and META-SHARE: An Overview
META-NET and META-SHARE: An OverviewMETA-NET and META-SHARE: An Overview
META-NET and META-SHARE: An Overview
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
Semantic Web in the Plateau of Productivity
Semantic Web in the Plateau of ProductivitySemantic Web in the Plateau of Productivity
Semantic Web in the Plateau of Productivity
 
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...
New directions for blog network mapping [with Lars Kirchhoff and Thomas Nicol...
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
 
Strategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher EducationStrategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher Education
 
Strategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher EducationStrategies for implementing ePortfolios in Higher Education
Strategies for implementing ePortfolios in Higher Education
 
Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11Tut mathematics and hypermedia research seminar 2011 11-11
Tut mathematics and hypermedia research seminar 2011 11-11
 
Monitoring the transformation of a domain-specific portal into a social infor...
Monitoring the transformation of a domain-specific portal into a social infor...Monitoring the transformation of a domain-specific portal into a social infor...
Monitoring the transformation of a domain-specific portal into a social infor...
 

Más de Georg Rehm

QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...Georg Rehm
 
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...Georg Rehm
 
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...Georg Rehm
 
Künstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenKünstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenGeorg Rehm
 
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Georg Rehm
 
European Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureEuropean Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureGeorg Rehm
 
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationGeorg Rehm
 
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickKI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickGeorg Rehm
 
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Georg Rehm
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeGeorg Rehm
 
Kuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KIKuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KIGeorg Rehm
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryGeorg Rehm
 
KI für die Kundenkommunikation
KI für die KundenkommunikationKI für die Kundenkommunikation
KI für die KundenkommunikationGeorg Rehm
 
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Georg Rehm
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenGeorg Rehm
 
EPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3CEPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3CGeorg Rehm
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeGeorg Rehm
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Georg Rehm
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Georg Rehm
 
Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital EuropeGeorg Rehm
 

Más de Georg Rehm (20)

QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
 
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
 
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
 
Künstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenKünstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und Übersetzen
 
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
 
European Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureEuropean Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and Future
 
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
 
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickKI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
 
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
 
Kuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KIKuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KI
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film Industry
 
KI für die Kundenkommunikation
KI für die KundenkommunikationKI für die Kundenkommunikation
KI für die Kundenkommunikation
 
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
 
EPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3CEPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3C
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
 
Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Observations on Annotations – From Computational Linguistics and the World Wide Web to Artificial Intelligence and back again

  • 1. Georg Rehm German Research Center for Artificial Intelligence (DFKI) GmbH Annotation in scholarly editions and research Bergische Universität Wuppertal – 21 February 2019 Observations on Annotations From Computational Linguistics and the World Wide Web to AI and back again
  • 2. Observations on Annotations – Wuppertal, Germany, 21 February 2019 2 Annotation Computational Linguistics and AI (since 1992) SGML and TEI (since 1995) XML since 1998 XSLT XPath Several others ... Corpus annotation formats Hypertext and Textlinguistics Web Technologies, W3C, Markup Languages W3C Office Germany/Austria (since 2013) AI and Language Technology Development (since 2009) Infrastructures and Platforms Service Deployment Research Data Language Resources Metadata Data FormatsOpen Science Annotation: Personal Background
  • 3. Introduction • Annotations have been playing an important role in Computational Linguistics and related fields (especially Digital Humanities) for decades. • This talk: Recent examples, lessons learned and some general observations on annotations. • My own research in this area (since approx. 1996): – from basic and applied research to – innovation and technology development Observations on Annotations – Wuppertal, Germany, 21 February 2019 3
  • 4. Outline • Annotations – brief definition • World Wide Web • Annotations and AI • Annotations and Computational Linguistics • Annotations and Language Technology • Annotations for a Credible Web • Annotations and Open Science • Annotations and Markup • Dimensions of Annotations • Summary and Conclusions Observations on Annotations – Wuppertal, Germany, 21 February 2019 4
  • 5. Annotations: a brief definition Observations on Annotations – Wuppertal, Germany, 21 February 2019 5
  • 6. Annotations • Definition/“Definition”: Secondary data added to a piece of primary data – in science, this is, often, research data. • Wikipedia: An annotation is a metadatum (e.g., a post, explanation, markup) attached to [a?] location or other data. http://www.merriam-webster.com Observations on Annotations – Wuppertal, Germany, 21 February 2019 6
  • 7. • Literature and education: – Textual scholarship: Textual scholarship is a discipline that often uses the technique of annotation to describe or add additional historical context to texts and physical documents. – Learning and instruction: As part of guided noticing [annotation] involves highlighting, naming or labelling and commenting aspects of visual representations to help focus learners' attention on specific visual aspects. In other words, it means the assignment of typological representations (culturally meaningful categories), to topological representations (e.g. images). • Software engineering: – Text documents: Markup languages like XML and HTML annotate text in a way that is syntactically distinguishable from that text. They can be used to add information about the desired visual presentation, or machine-readable semantic information, as in the semantic web. • Linguistics: – In linguistics, annotations include comments and metadata; these non-transcriptional annotations are also non-linguistic. Observations on Annotations – Wuppertal, Germany, 21 February 2019 7
  • 8. World Wide Web Observations on Annotations – Wuppertal, Germany, 21 February 2019 8
  • 9. Observations on Annotations – Wuppertal, Germany, 21 February 2019 9
  • 10. Observations on Annotations – Wuppertal, Germany, 21 February 2019 10
  • 11. Observations on Annotations – Wuppertal, Germany, 21 February 2019 11
  • 12. “Vague but exciting” Observations on Annotations – Wuppertal, Germany, 21 February 2019 12 Information Management: A Proposal Tim Berners-Lee, CERN, March 1989, May 1990 “Private links One must be able to add one's own private links to and from public information. One must also be able to annotate links, as well as nodes, privately.”
  • 13. World Wide Web Consortium • W3C is an international non-profit member-financed standards developing organisation • Founded in 1994 by Sir Tim Berners-Lee • Currently 451 members – 23 in Germany/Austria • Approx. 60 staff (ERCIM, MIT, UKeio, UBeihang) • Approx. 20 offices in important regions • The W3C Office Germany/Austria is run by • Open Web Platform, HTML5, CSS, Credible Web, Digital Publishing, Linked Data etc. http://w3.org ! http://w3c.de 13 Interested in joining? Talk to me!
  • 14. Relevant W3C Standards • XML – Extensible Markup Language – Extremely influential – Widely adopted – TEI and many other languages • Semantic Web – RDF, OWL, SPARQL, SKOS etc. • Digital Publishing – New versions of EPub • Web Annotation Data Model and Vocabulary Observations on Annotations – Wuppertal, Germany, 21 February 2019 14 https://www.w3.org/2001/10/03-sww-1/slide7-0.html
  • 15. Web Annotation Observations on Annotations – Wuppertal, Germany, 21 February 2019 15
  • 16. Web Annotations • Web Annotation – Three W3C Recommendations • Most popular and relevant implementation: Hypothes.is – Mission-driven, non-profit Open Source company – Main focus on scholarly publishing (“Annotating All Knowledge Coalition”) – Very active and vibrant community • Hypothes.is: main driving force behind the I Annotate conference series – Open proceedings, very interesting programme, diverse speakers from several disciplines – consider attending! – Videos of almost all previous events available online Observations on Annotations – Wuppertal, Germany, 21 February 2019 16
  • 17. • Web Annotation Data Model Describes the underlying Annotation Abstract Data Model as well as a JSON-LD serialization • Web Annotation Vocabulary The Vocabulary which underpins the Web Annotation Data Model • Web Annotation Protocol The HTTP API for publishing, syndicating, and distributing Web Annotations • Published on 23 February 2017 Observations on Annotations – Wuppertal, Germany, 21 February 2019 17 Web Annotation Standard
  • 18. Web Annotation Standard • What does this mean for end users? – Annotation: a set of connected resources, typically incl. a body and target – the body is related to the target. – No more comment widgets and silos! – Annotation capability can be built natively into the browser – Conversations can take place anywhere on the web and in a standards-based way • Why is this different? – Annotations can live separately from documents and are reunited and re-anchored in real-time – Annotations are under the control of the user – Users can form communities (across HTML, PDF etc.) Observations on Annotations – Wuppertal, Germany, 21 February 2019 18
  • 19. Observations on Annotations – Wuppertal, Germany, 21 February 2019 19
  • 20. Hypothes.is Statistics Observations on Annotations – Wuppertal, Germany, 21 February 2019 20 December 2018: 4.4 Million Annotations and Counting 260K In groups, private In groups, shared Private Public JAN 2015 JAN 2016 JAN 2017 JAN 2018 DEC 2018 20K 40K 60K 80K 100K 120K 140K 160K 180K 200K 220K 240K
  • 21. The Hypothes.is Tool Observations on Annotations – Wuppertal, Germany, 21 February 2019 21 ! Private Notes ! Public annotations ! Collaboration groups ! Linked Data connections ! Cross format: ○ HTML ○ PDF ○ EPUB ○ Data ! Community driven ! Open Source
  • 22. Open Groups Observations on Annotations – Wuppertal, Germany, 21 February 2019 22
  • 23. Errata and Corrections Observations on Annotations – Wuppertal, Germany, 21 February 2019 23
  • 24. Observations on Annotations – Wuppertal, Germany, 21 February 2019 24 ADA: American Diabetes Association ● Wanted a way to update content and add information links ● Needed to restrict use to ADA staff
  • 25. Peer Review Observations on Annotations – Wuppertal, Germany, 21 February 2019 25
  • 26. Automated Annotation Observations on Annotations – Wuppertal, Germany, 21 February 2019 26 Automated systems can tag elements such as RRIDs (Research Resource Identifiers) and other scholarly identifiers or entities, allowing navigation to background information and powerful search queries through other papers mentioning the same entity.
  • 27. User Profiles Observations on Annotations – Wuppertal, Germany, 21 February 2019 27
  • 28. Use anywhere on the web Observations on Annotations – Wuppertal, Germany, 21 February 2019 28
  • 29. Annotations and AI Observations on Annotations – Wuppertal, Germany, 21 February 2019 29
  • 30. Observations on Annotations – Wuppertal, Germany, 21 February 2019 30
  • 31. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Data Intelligence Current breakthroughs based on Machine Learning (“Deep Learning”) Also still in use: symbolic, rule-based methods and expert systems Artificial Intelligence Huge data sets + powerful learning algorithms + very fast hardware 31
  • 32. Annotations and AI • Modern AI is data-driven – supervised learning relies on annotated data sets. • However, certain AI algorithms can learn structure and patterns without any annotations whatsoever. • The relevance of annotations has increased dramatically. • This is especially true for very large annotated data sets. • Many consist of primary data and secondary annotations. • Companies have emerged that produce annotated data sets using crowd-workers (e.g., Figure Eight, Crowdee) • Key question: how detailed, relevant, correct, meaningful and reliable are these annotations really? Observations on Annotations – Wuppertal, Germany, 21 February 2019 32
  • 33. Annotations and Events • Likes and Favs (user-driven annotation, action) • Five-star ratings (user-driven annotation, action) • Online comments (user-driven annotation, action) • Online reviews (user-driven annotation, action) • Clicking an article headline/link (user-initiated event, action) • Reading an ebook (user-initiated event, action) – Page turns in ebooks are measured – when slow: “boredom”, “disinterest” – Next time in the ebook store you’re getting adjusted recommendations • No longer reading an ebook (user-initiated event, non-action) – Boring chapters where people throw in the towel can be easily identified – (Brave new) future: use automatic paraphrasing to re-write the chapter – Or maybe NLG and A/B tests – then it’s the original author vs. the machine Observations on Annotations – Wuppertal, Germany, 21 February 2019 33
  • 34. Annotations in Computational Linguistics Observations on Annotations – Wuppertal, Germany, 21 February 2019 34
  • 35. Annotations in CL • Diverse and specialised tool landscape http://annotation.exmaralda.org/index.php?title=Linguistic_Annotation • Diverse and specialised format landscape: TEI, NIF, NAF, LAF, TIGER, STTS, FoLiA and many, many others • From trivial annotation schemes to extremely complex • From low inter-annotator agreement scores to high ones • From flexible tools to highly specialised tools • From very high quality annotations to very low ones • A brief look at a few tools … Observations on Annotations – Wuppertal, Germany, 21 February 2019 35
  • 36. Observations on Annotations – Wuppertal, Germany, 21 February 2019 36 Exmaralda
  • 37. Observations on Annotations – Wuppertal, Germany, 21 February 2019 37 Praat
  • 38. Observations on Annotations – Wuppertal, Germany, 21 February 2019 38 ELAN
  • 39. Observations on Annotations – Wuppertal, Germany, 21 February 2019 39 brat
  • 40. Observations on Annotations – Wuppertal, Germany, 21 February 2019 40 WebAnno
  • 41. Observations on Annotations – Wuppertal, Germany, 21 February 2019 41 Annis
  • 42. Annotations in Language Technology Observations on Annotations – Wuppertal, Germany, 21 February 2019 42
  • 43. Language Technology • Language Technology transfers theoretical results from language-oriented research into technologies and applications that are ready for production use. • Uses results from, e.g.: – Artificial Intelligence – Computer Science – Computational Linguistics – Natural Language Processing – Psychology, Psycholinguistics – Cognitive Science Observations on Annotations – Wuppertal, Germany, 21 February 2019 43 Example Applications • Spell checkers • Dictation systems • Translation systems • Search engines • Report generation • Expert systems • Dialogue systems • Text summarisers
  • 44. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Web Annotation Architecture The relationship between Web Annotations and Language Technology on a rather general level. 44
  • 45. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Web Annotation Architecture Content could be created by Language Technology fully automatically or in a semi-automatic way (text generation) 45
  • 46. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Web Annotation Architecture Content could be analysed by Language Technology (semantic analysis, input for ML algorithms etc.) 46
  • 47. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Web Annotation Architecture Especially in Social Media Analytics we are interested in UGC, i.e., in comments, feedback – “what do users think of a certain product?“. 47
  • 48. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Web Annotation Architecture • Analysing UGC is difficult and costly (many heterogeneous sources, many different formats) • A few established and widely used Web Annotation services would simplify SMA dramatically! 48
  • 49. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Web Annotation Architecture We can also use LT methods to create or help create annotations, e.g., in smart authoring scenarios. 49
  • 50. LT and Web Annotations • Analysis of web annotations and exploiting web annotations through Language Technology: – Arbitrary web annotations (i.e., unstructured text) • No more crawling, aggregating, mapping! – Dedicated LT-specific web annotations • Annotating language data without any specialised stand-alone tools or data repositories! • Generation of web annotations through Language Technology (e.g., to provide background information on important content). Example: Content semantification. Observations on Annotations – Wuppertal, Germany, 21 February 2019 50
  • 51. Platform for digital Curation Technologies Broker REST API Curation Service 1 Curation Service 2 Client uses the API External Service 1 External Service 2 Client uses the API Client uses the API Curation Workflow Input Output @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . <http://link.omitted/documents/document1#char=0,26> a nif:RFC5147String , nif:String , nif:Context ; nif:beginIndex "0"^^xsd:nonNegativeInteger ; nif:endIndex "26"^^xsd:nonNegativeInteger ; nif:isString "Welcome to Berlin in 2016. "^^xsd:string ; dfkinif:averageLatitude "52.516666666666666"^^xsd:double ; dfkinif:averageLongitude "13.383333333333333"^^xsd:double ; dfkinif:stdDevLatitude "0.0"^^xsd:double ; dfkinif:stdDevLongitude "0.0"^^xsd:double ; nif:meanDateRange "20160101010000_20170101010000"^^xsd:string . <http://link.omitted/documents/document1#char=21,25> a nif:RFC5147String , nif:String ; itsrdf:taIdentRef <http://link.omitted/ontologies/nif#date=20160101000000_20170101000000> ; nif:anchorOf "2016"^^xsd:string ; nif:beginIndex "21"^^xsd:nonNegativeInteger ; nif:endIndex "25"^^xsd:nonNegativeInteger ; nif:entity <http://link.omitted/ontologies/nif#date>. <http://link.omitted/documents/#char=11,17> a nif:RFC5147String , nif:String ; nif:anchorOf "Berlin"^^xsd:string ; nif:beginIndex "11"^^xsd:nonNegativeInteger ; nif:endIndex "17"^^xsd:nonNegativeInteger ; itsrdf:taClassRef <http://dbpedia.org/ontology/Location> ; nif:referenceContext <http://link.omitted/documents/#char=0,26> ; geo:lat "52.516666666666666"^^xsd:double ; geo:long "13.383333333333333"^^xsd:double ; itsrdf:taIdentRef <http://dbpedia.org/resource/Berlin> . NLP Interchange Format (NIF) “Welcome to Berlin in 2016.” • RDF/OWL-basiertes Format für NLP- Anwendungen • Ermöglicht Interoperabilität • Durch pures RDF „natürliche“ Integration von Linked-Data-Daten • Entwickelt von der Universität Leipzig • Plattform unterstützt neben NIF auch Web Annotations Digital Curation Technologies: Prototypically implemented Platform and Services Peter Bourgonje, Julian Moreno-Schneider, Jan Nehring, Georg Rehm, Felix Sasaki, and Ankit Srivastava. “Towards a Platform for Curation Technologies: Enriching Text Collections with a Semantic-Web Layer.” In Harald Sack, Giuseppe Rizzo, Nadine Steinmetz, Dunja Mladenić, Sören Auer, and Christoph Lange, editors, The Semantic Web, number 9989 in LNCS, pages 65-68. Springer, June 2016. ESWC 2016 Satellite Events. Heraklion, Crete, Greece, May 29 - June 2, 2016 Revised Selected Papers. Client uses the API
  • 52. 52 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . <http://link.omitted/documents/document1#char=0,26> a nif:RFC5147String , nif:String , nif:Context ; nif:beginIndex "0"^^xsd:nonNegativeInteger ; nif:endIndex "26"^^xsd:nonNegativeInteger ; nif:isString "Welcome to Berlin in 2019. "^^xsd:string ; dfkinif:averageLatitude "52.516666666666666"^^xsd:double ; dfkinif:averageLongitude "13.383333333333333"^^xsd:double ; dfkinif:stdDevLatitude "0.0"^^xsd:double ; dfkinif:stdDevLongitude "0.0"^^xsd:double ; nif:meanDateRange "20190101010000_20200101010000"^^xsd:string . <http://link.omitted/documents/document1#char=21,25> a nif:RFC5147String , nif:String ; itsrdf:taIdentRef <http://link.omitted/ontologies/nif#date=20190101000000_20200101000000> ; nif:anchorOf "2019"^^xsd:string ; nif:beginIndex "21"^^xsd:nonNegativeInteger ; nif:endIndex "25"^^xsd:nonNegativeInteger ; nif:entity <http://link.omitted/ontologies/nif#date>. <http://link.omitted/documents/#char=11,17> a nif:RFC5147String , nif:String ; nif:anchorOf "Berlin"^^xsd:string ; nif:beginIndex "11"^^xsd:nonNegativeInteger ; nif:endIndex "17"^^xsd:nonNegativeInteger ; itsrdf:taClassRef <http://dbpedia.org/ontology/Location> ; nif:referenceContext <http://link.omitted/documents/#char=0,26> ; geo:lat "52.516666666666666"^^xsd:double ; geo:long "13.383333333333333"^^xsd:double ; itsrdf:taIdentRef <http://dbpedia.org/resource/Berlin> . NLP Interchange Format (NIF) “Welcome to Berlin in 2019.” • RDF/OWL-based format for NLP applications • Enables interoperability • Pure RDF and, hence, natural integration of Linked Data data • Developed by Universität Leipzig • Our platform also supports Web Annotation data model
  • 53. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Julian Moreno-Schneider, Ankit Srivastava, Peter Bourgonje, David Wabnitz, and Georg Rehm. “Semantic Storytelling, Cross- lingual Event Detection and other Semantic Services for a Newsroom Content Curation Dashboard.” In Octavian Popescu and Carlo Strapparava, editors, Proceedings of Natural Language Processing meets Journalism - EMNLP 2017 Workshop (NLPMJ 2017), Copenhagen, Denmark, September 2017. 7. September. Sector: Journalism 53
  • 54. Observations on Annotations – Wuppertal, Germany, 21 February 2019 Sector: TV, Web-TV, Media 54 Georg Rehm, Julián Moreno Schneider, Peter Bourgonje, Ankit Srivastava, Rolf Fricke, Jan Thomsen, Jing He, Joachim Quantz, Armin Berger, Luca König, Sören Räuchle, Jens Gerth, and David Wabnitz. “Different Types of Automated and Semi-Automated Semantic Storytelling: Curation Technologies for Different Sectors”. In Georg Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital Age: 27th International Conference, GSCL 2017, Berlin, Germany, September 13-14, 2017, Proceedings, number 10713 in Lecture Notes in Artificial Intelligence (LNAI), pages 232-247, Cham, Switzerland, January 2018. Gesellschaft für Sprachtechnologie und Computerlinguistik e.V., Springer. 13/14 September 2017.
  • 55. Annotations for a Credible Web Observations on Annotations – Wuppertal, Germany, 21 February 2019 55
  • 56. Observations on Annotations – Wuppertal, Germany, 21 February 2019 56
  • 57. Observations on Annotations – Wuppertal, Germany, 21 February 2019 57
  • 58. Viral Content and Filter Bubbles • Content is often published without checking its validity, discovered through social media and, if it appears relevant, shared immediately. • Content is often shared without reading it. • Goal: virality ➟ reach ➟ clicks ➟ ad revenue • Not all “journalistic” content (or publishing outlets) is really committed to reporting the facts. • Nowadays the burden of fact-checking is with the readers. • „Fake news“: label for several classes of online content. • Can we balance out filter bubble and network effects? Observations on Annotations – Wuppertal, Germany, 21 February 2019 58 Georg Rehm. “An Infrastructure for Empowering Internet Users to handle Fake News and other Online Media Phenomena”. In Georg Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital Age: Proceedings of the GSCL Conference 2017, Berlin, September 2017. Gesellschaft für Sprachtechnologie und Computerlinguistik e.V. 13.-15. September 2017.
  • 59. Seven classes of false news Satire or parody Wrong connection or relation: when title and photos don‘t support the content Misleading content: use of information to put someone or something in a bad light Wrong context: when genuine content is presented in the wrong context Deceiving content: imitation of real sources Bad content with a clear purpose to deceive Fabricated content: completely untrue, produced to deceive Characteristics Clickbait X X ? ? ? Disinformation X X X X Political bias ? X ? ? X Bad journalism X X X Publisher‘sintention Parody X ? ? Provocation X X X Profit ? X X X Deception X X X X X X Influence politics X X X X Influence politics X X X X X Different classes of false news and their individual characteristics and intentions (based on Wardle, 2017; Walbrühl, 2017; Rubin et al., 2015; Holan, 2016; Weedon et al., 2017) 59
  • 60. Website with content Tool1 Browser has native support for the infrastructure and aggregates the different scores, messages and values into messages or warnings regarding the content Web Annotations DB1 Web Annotations DB2 Tool3 Tool2 UGA: User-generated annotations (free text) UGM: User-generated metadata (standardised) MGM: Machine-generated Metadata (standardised) MGM MGM MGM Decentral filters process content automatically and send results to the browser (important: multilingualism) UGA Web Annotations DB4UGM Example: user rates the content quality regarding a standardised schema other users‘ annotations Other users Web Annotations DB3 UGA UGM UGM UGA Decentral repositories store all annotations Detection of hate speech Classify content for its political spectrum Fact checker
  • 61. Website with content Tool1 Browser has native support for the infrastructure and aggregates the different scores, messages and values into messages or warnings regarding the content Web Annotations DB1 Web Annotations DB2 Tool3 Tool2 UGA: User-generated annotations (free text) UGM: User-generated metadata (standardised) MGM: Machine-generated Metadata (standardised) MGM MGM MGM Decentral filters process content automatically and send results to the browser (important: multilingualism) UGA Web Annotations DB4UGM Example: user rates the content quality regarding a standardised schema other users‘ annotations Other users Web Annotations DB3 UGA UGM UGM UGA Decentral repositories store all annotations Detection of hate speech Classify content for its political spectrum Fact checker • Infrastructure as a native part of the web • Necessary for that: support and buy-in from all browser vendors, media publishers and standards • All users need immediate access
  • 62. Website with content Tool1 Browser has native support for the infrastructure and aggregates the different scores, messages and values into messages or warnings regarding the content Web Annotations DB1 Web Annotations DB2 Tool3 Tool2 UGA: User-generated annotations (free text) UGM: User-generated metadata (standardised) MGM: Machine-generated Metadata (standardised) MGM MGM MGM Decentral filters process content automatically and send results to the browser (important: multilingualism) UGA Web Annotations DB4UGM Example: user rates the content quality regarding a standardised schema other users‘ annotations Other users Web Annotations DB3 UGA UGM UGM UGA Decentral repositories store all annotations Detection of hate speech Classify content for its political spectrum Fact checker Tools analyse automatically
  • 63. Website with content Tool1 Browser has native support for the infrastructure and aggregates the different scores, messages and values into messages or warnings regarding the content Web Annotations DB1 Web Annotations DB2 Tool3 Tool2 UGA: User-generated annotations (free text) UGM: User-generated metadata (standardised) MGM: Machine-generated Metadata (standardised) MGM MGM MGM Decentral filters process content automatically and send results to the browser (important: multilingualism) UGA Web Annotations DB4UGM Example: user rates the content quality regarding a standardised schema other users‘ annotations Other users Web Annotations DB3 UGA UGM UGM UGA Decentral repositories store all annotations Detection of hate speech Classify content for its political spectrum Fact checker • Automatic results and free text annotations are stored as Web Annotations. • Users make their annotations available to one another.
  • 64. Website with content Tool1 Browser has native support for the infrastructure and aggregates the different scores, messages and values into messages or warnings regarding the content Web Annotations DB1 Web Annotations DB2 Tool3 Tool2 UGA: User-generated annotations (free text) UGM: User-generated metadata (standardised) MGM: Machine-generated Metadata (standardised) MGM MGM MGM Decentral filters process content automatically and send results to the browser (important: multilingualism) UGA Web Annotations DB4UGM Example: user rates the content quality regarding a standardised schema other users‘ annotations Other users Web Annotations DB3 UGA UGM UGM UGA Decentral repositories store all annotations Detection of hate speech Classify content for its political spectrum Fact checker • Automatic analysis of free text annotations (NLP, IE, RE etc.). • Extraction of opinions, arguments, claims, statements etc.
  • 65. Website with content Tool1 Browser has native support for the infrastructure and aggregates the different scores, messages and values into messages or warnings regarding the content Web Annotations DB1 Web Annotations DB2 Tool3 Tool2 UGA: User-generated annotations (free text) UGM: User-generated metadata (standardised) MGM: Machine-generated Metadata (standardised) MGM MGM MGM Decentral filters process content automatically and send results to the browser (important: multilingualism) UGA Web Annotations DB4UGM Example: user rates the content quality regarding a standardised schema other users‘ annotations Other users Web Annotations DB3 UGA UGM UGM UGA Decentral repositories store all annotations Detection of hate speech Classify content for its political spectrum Fact checker UGM • Standardised metadata schemas for efficient annotations, e.g. “content is intentionally deceptive.” • W3C Provenance Ontology, Schema.org (ClaimReview). • To be used by the human and the machine
  • 66. Website with content Tool1 Browser has native support for the infrastructure and aggregates the different scores, messages and values into messages or warnings regarding the content Web Annotations DB1 Web Annotations DB2 Tool3 Tool2 UGA: User-generated annotations (free text) UGM: User-generated metadata (standardised) MGM: Machine-generated Metadata (standardised) MGM MGM MGM Decentral filters process content automatically and send results to the browser (important: multilingualism) UGA Web Annotations DB4UGM Example: user rates the content quality regarding a standardised schema other users‘ annotations Other users Web Annotations DB3 UGA UGM UGM UGA Decentral repositories store all annotations Detection of hate speech Classify content for its political spectrum Fact checker UGM Goal: provide technologies to the user, with which they can consume, assess, analyse, verify and process digital content and media in a better way and that indicate which contents may be problematic.
  • 67. Web Annotation + Fake News • Crowd-sourced Web Annotation content in combination with a set of automatic analysis tools has enormous potential to tackle online misinformation campaigns. • Big impact if deployed widely and implemented correctly. • However, there’s a danger to shift the point of attack that misinformation campaigns exploit (to annotations). • The Credibility Coalition has developed a similar approach in parallel, see, e.g., https://web.hypothes.is/blog/annotation-powered-questionnaires/ Observations on Annotations – Wuppertal, Germany, 21 February 2019 67
  • 68. Annotations and Open Science Observations on Annotations – Wuppertal, Germany, 21 February 2019 68
  • 69. Open Science • Movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. • Encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open notebook science, and generally making it easier to publish and communicate scientific knowledge. • Connection to: annotations, research data (corpora, LRs), semantics, knowledge, linked data, repositories and other topics. Observations on Annotations – Wuppertal, Germany, 21 February 2019 69 https://en.wikipedia.org/wiki/Open_science
  • 70. Observations on Annotations – Wuppertal, Germany, 21 February 2019 70 Open Science Taxonomy https://en.wikipedia.org/wiki/Open_science
  • 71. Observations on Annotations – Wuppertal, Germany, 21 February 2019 71 Open Science Taxonomy https://en.wikipedia.org/wiki/Open_science
  • 72. Annotations & Open Science • Open Science will soon become the norm and goal in data-intensive science • Important aspects: interoperability, reproducibility, open documentation of experiments, use of standards etc. • Trend: open tools, open workflows, open data sets • Annotations are an important and crucial piece of the puzzle, especially documented, meaningful annotations • Relevant initiatives: NFDI, EOSC • Relevant principle: FAIR Observations on Annotations – Wuppertal, Germany, 21 February 2019 72
  • 73. FAIR Principles • TO BE FINDABLE: – F1 (meta)data are assigned a globally unique and eternally persistent identifier. – F2 data are described with rich metadata. – F3 (meta)data are registered or indexed in a searchable resource. – F4 metadata specify the data identifier. • TO BE ACCESSIBLE: – A1 (meta)data are retrievable by their identifier using a standardized protocol. – A1.1 the protocol is open, free, and universally implementable. – A1.2 the protocol allows for an authentication and authorization procedure. – A2 metadata are accessible, even when the data are no longer available. • TO BE INTEROPERABLE: – I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. – I2. (meta)data use vocabularies that follow FAIR principles. – I3. (meta)data include qualified references to other (meta)data. • TO BE RE-USABLE: – R1. meta(data) have a plurality of accurate and relevant attributes. – R1.1 (meta)data are released with a clear and accessible data usage license. – R1.2 (meta)data are associated with their provenance. – R1.3 (meta)data meet domain-relevant community standards. Observations on Annotations – Wuppertal, Germany, 21 February 2019 73
  • 74. Open Science and … Science • Open Science approaches recommend the use of standards • Only standardised data and metadata are truly interoperable • BUT fundamental research is about inventing NEW things • This contradicts the use of standards as the consensus that was reached within a specific community • However, it does NOT contradict the use of established tools and best practice approaches • Neither does it contradict the modification of standards • At the end of the day, it’s about semantics & documentation • If an established, standardised approach does not work for a new piece of research, invent a new approach or get creative! Observations on Annotations – Wuppertal, Germany, 21 February 2019 74
  • 75. Annotation of Documents • Open Science will be transforming research, making it more sustainable, more visible, more transparent • Substantially improved digital infrastructures • This will, soon, include the annotation of documents, starting with scientific publications (Web Annotation) • First steps towards Open Peer Review (cf. arxiv.org) • Trend: micro-publications (esp. for incremental research) • Will the scientific paper continue to be the atomic unit? • Important relevant initiative: ORKG Observations on Annotations – Wuppertal, Germany, 21 February 2019 75
  • 76. ORKG • Vision driven forward by Sören Auer (TIB Hannover) • Exchange of scholarly knowledge is primarily document-based: researchers produce articles (online or offline) as coarse-grained text documents. • Transform this predominant paradigm into knowledge- based information flows by representing and expressing knowledge through semantically rich, interlinked graphs. • Sören Auer et al. (2018): “Towards an Open Research Knowledge Graph“. https://doi.org/10.5281/zenodo.1157185 Observations on Annotations – Wuppertal, Germany, 21 February 2019 76
  • 77. Interlinking of Concepts Observations on Annotations – Wuppertal, Germany, 21 February 2019 77 ated procedures alone do not achieve the necessary coverage and accuracy; fully manual n is too time-consuming; librarians lack the necessary domain-specific expertise; and scientists e necessary expertise in knowledge representation. By combining the four strategies in a ngful way, they can bring their respective strengths to bear and compensate for the weak points. Interlinking of interdisciplinary and subject-specific concepts and artefacts of scientific work in the different domains (here: TIB subject areas). Open Research Knowledge Graph (ORKG) provides interlinking, integration, visualization, ation, and search functions. It enables scientists to gain a much faster overview of new pments in a specific field and identify relevant research problems. It represents the evolution of entific discourse in the individual disciplines and enables scientists to make their work more ible to colleagues and potential users in industry through semantic description. Figure 3 depicts a ch contribution represented in simplified form by a knowledge graph. technical ecosystem for knowledge-based science communication. ​The ORKG service is Auer et al. (2018) Linked Open Data Cloud Semantic Web Standards Persistent Identifiers GND European Open Science Cloud
  • 78. Annotations and Markup Observations on Annotations – Wuppertal, Germany, 21 February 2019 78
  • 79. Annotations and Markup • Complex topic – we can only scratch the surface • XML is – unfortunately – considered “done” within W3C, all senior XML specialists have left the organisation. • https://www.balisage.net/Proceedings/vol21/html/Tovey0 1/BalisageVol21-Tovey01.html – Discussion on the trend from declarative to procedural (!) markup – there’s stagnation in the markup world. • Relevant and timely: https://markupdeclaration.org • Markup is not dead – there’s a small but active and passionate community. Observations on Annotations – Wuppertal, Germany, 21 February 2019 79
  • 80. Dimensions of Annotations Observations on Annotations – Wuppertal, Germany, 21 February 2019 80
  • 81. Annotations • Annotation – Definition: Secondary data added to a piece of primary data – in science, this is, often, research data. • The secondary data is, typically, a property of part of the primary research data. • Let’s examine this a bit more closely. Observations on Annotations – Wuppertal, Germany, 21 February 2019 81
  • 82. Annotations Observations on Annotations – Wuppertal, Germany, 21 February 2019 82 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Property Label of property Value of property Pointer to annotation schema Annotation schema (possibly external) may constrain or restrict Examples: lemma, part of speech, instance-of etc. • What is the conceptual nature of this property? Is it best practice in research or can it be entirely made up? • How many colleagues in the community agree on it? • Is the label adequate and self-explanatory? Text
  • 83. Annotations Observations on Annotations – Wuppertal, Germany, 21 February 2019 83 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Property Label of property Value of property Pointer to annotation schema Annotation schema (possibly external) may constrain or restrict Examples: adjective, JJ, object, “some free text comment” etc. • The actual annotation payload • Is the value free text or taken from a shared vocabulary? • Is the shared vocabulary prescribed by an annotation schema or ontology? • How many colleagues in the community agree on the value? • How many colleagues in the community agree on the shared vocabulary? Text
  • 84. Annotations Observations on Annotations – Wuppertal, Germany, 21 February 2019 84 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Property Label of property Value of property Pointer to annotation schema Annotation schema (possibly external) may constrain or restrict Text • Is there structure among the different properties? • Markup languages, markup grammars • Syntactic structure – Ex.: “HVBXJ” => “AHXB”, “HKVZ” • Semantic, i.e., logical structure – Ex.: “NP” => “DET”, “N” Many annotations
  • 85. Annotating Annotations Annotations on annotations (just a few selected points) • Source (machine vs. single human vs. crowd-sourced) • Application scenario: annotations for human vs. machine consumption • Purpose or scope of the annotation (e.g., document structure, layout or style, semantics, rhetorical structure, linguistic properties etc.) – Can the structure be made explicit by the annotation format, maybe via a markup language’s grammar? – Can structure be made explicit through an ontology that is put on top of the individual properties? • Confidence value • Quality indicator (0..1) • Time added, time modified (timestamp) • Style information – how annotations are rendered • Annotation layers – one or multiple layers, independent or interrelated? Observations on Annotations – Wuppertal, Germany, 21 February 2019 85
  • 86. Evaluation of Annotations • Measuring inter-annotator agreement • Measuring intra-annotator agreement – what if the same person does the same annotation task again after a week or a month? • Test replicability and reproducibility • Important exercise for: – Emerging annotation formats – Complex annotation exercises – Measuring consensus – Making sure that terms and labels are meaningful Observations on Annotations – Wuppertal, Germany, 21 February 2019 86
  • 87. Complexity of Annotations • In (Computational) Linguistics we’ve designed some fairly detailed annotation formats in the last 30 years. • In contrast, many modern data sets (especially for data- driven AI approaches in NLP) are quite shallow. • AI classifiers need enormous amounts of data and just a few high-level labels. • It’s not feasible and too expensive to annotate data with complex and sophisticated annotation formats. • Is NLP/AI research forgetting annotation principles? • Are we dumbing down linguistics to the simple annotation of trivial labels? • Has annotation research perhaps become obsolete? Observations on Annotations – Wuppertal, Germany, 21 February 2019 87
  • 88. • Example: GermEval 2018 data set Tweet label, tweet label, tweet label etc. • There is no structure, no concretisation, no hierarchical information, no additional metadata • Two observations: – there’s a trend towards simply more annotations, i.e., increased quantity while ignoring quality, complexity and structure – complex annotations are expensive and difficult to generalise from. – there’s a trend towards dumb annotations, which are often crowd-sourced – it’s easier to generalise from simple than from structured, hierarchical annotations. Observations on Annotations – Wuppertal, Germany, 21 February 2019 88 Complexity of Annotations
  • 89. Summary and Conclusions Observations on Annotations – Wuppertal, Germany, 21 February 2019 89
  • 90. Summary • Annotations: from trivial to very complex • From experimental to highly (de facto) standardised • Annotations of annotations • Multi-layer annotations – independent or interrelated • Interoperability and reusability through standards • But: standards vs. flexibility – basic science vs. applied • Nowadays, annotations usually happen in the web • Powerful stack of W3C technologies: Web Annotation, Semantic Web, Linked Data, XML • Web-scale annotations for scholarly publishing • Annotations for Open Science Observations on Annotations – Wuppertal, Germany, 21 February 2019 90
  • 91. Summary • Language Technology … • … to automate the generation of annotations – Semantification of journalistic/media content – Semantification of scientific content • … to automate the analysis of annotations – Annotations for Open Science • … to restore credibility and trust in the media • In AI, annotations in data sets are often trivial – Trend towards simply more and more annotations – Trend towards more and more simple annotations Observations on Annotations – Wuppertal, Germany, 21 February 2019 91
  • 92. Annotating Annotations • Different Dimensions of Annotations • Is it possible to tie all dimensions together in a compact, machine-readable way to describe and document an annotation project? – Complexity – Semantics – Source – Impact – Standard – Research Question – Methodology – … Observations on Annotations – Wuppertal, Germany, 21 February 2019 92 • Relevant for Open Science • Relevant for interoperability • Relevant for search & retrieval • Relevant for reproducibility • Relevant for evaluation • Relevant for documentation & repos • Relevant for good scientific practice • … but maybe this is all too complicated because a scientific paper already does the trick in an established way?
  • 93. Four Quadrant Diagram Observations on Annotations – Wuppertal, Germany, 21 February 2019 93 Basic research Applications and solutions Humanities research Computer Science and ICT research X • No need for standardisation • No need to use standards X Clear need to use standards for maximum adoption X • Avantgarde formats • Weird phenomena • Weird needs • Expressibility X • Performance • Standards • Interoperability Number of users: rather small Number of users: rather high XAI X • Markup • Formal languages • Querying • Overlap X Digital Humanities This diagram is work in progress.
  • 94. Thank you! Dr. Georg Rehm Principal Researcher and Research Fellow Speech and Language Technology Lab DFKI, Berlin, Germany ! georg.rehm@dfki.de ! http://georg-re.hm ! http://de.linkedin.com/in/georgrehm ! https://www.slideshare.net/georgrehm With many thanks to (in alphabetical order): • Ivan Herman (W3C, The Netherlands) • Heather Staines, Jon Udell, Dan Whaley (Hypothes.is, USA) Observations on Annotations – Wuppertal, Germany, 21 February 2019 94
  • 95. • Georg Rehm, Julian Moreno Schneider, and Peter Bourgonje. Automatic and Manual Web Annotations in an Infrastructure to handle Fake News and other Online Media Phenomena. In Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, editors, Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), pages 2416-2422, Miyazaki, Japan, May 2018. European Language Resources Association (ELRA). • Georg Rehm. An Infrastructure for Empowering Internet Users to handle Fake News and other Online Media Phenomena. In Georg Rehm and Thierry Declerck, editors, Language Technologies for the Challenges of the Digital Age: 27th International Conference, GSCL 2017, Berlin, Germany, September 13-14, 2017, Proceedings, number 10713 in Lecture Notes in Artificial Intelligence (LNAI), pages 216-231, Cham, Switzerland, January 2018. Gesellschaft für Sprachtechnologie und Computerlinguistik e.V., Springer. 13/14 September 2017. • Georg Rehm. The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016), pages 2450-2454, Portorož, Slovenia, May 2016. European Language Resources Association (ELRA). • Georg Rehm. Texttechnologische Grundlagen. In Kai-Uwe Carstensen, Christian Ebert, Cornelia Endriss, Susanne Jekat, Ralf Klabunde, and Hagen Langer, editors, Computerlinguistik und Sprachtechnologie - Eine Einführung, pages 159-168. Spektrum, Heidelberg, 3 edition, 2010. • Georg Rehm, Richard Eckart, Christian Chiarcos, and Johannes Dellert. Ontology-Based XQuery'ing of XML- Encoded Language Resources on Multiple Annotation Layers. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, editors, Proc. of the 6th Language Resources and Evaluation Conference (LREC 2008), pages 525-532, Marrakesh, Morocco, May 2008. • Georg Rehm, Andreas Witt, Erhard Hinrichs, and Marga Reis. Sustainability of Annotated Resources in Linguistics. In Lisa Lena Opas-Hänninen, Mikko Jokelainen, Ilkka Juuso, and Tapio Seppänen, editors, Digital Humanities 2008, pages 21-29, Oulu, Finland, June 2008. ACH, ALLC. • Andreas Witt, Georg Rehm, Timm Lehmberg, and Erhard Hinrichs. Mapping Multi-Rooted Trees from a Sustainable Exchange Format to TEI Feature Structures. In TEI@20: 20 Years of Supporting the Digital Humanities. The 20th Anniversary TEI Consortium Members' Meeting, University of Maryland, College Park, October 2007. • Andreas Witt, Oliver Schonefeld, Georg Rehm, Jonathan Khoo, and Kilian Evang. On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees. In B. Tommie Usdin, editor, Proceedings of Extreme Markup Languages 2007, Montréal, Canada, August 2007. • Kai Wörner, Andreas Witt, Georg Rehm, and Stefanie Dipper. Modelling Linguistic Data Structures. In B. Tommie Usdin, editor, Proceedings of Extreme Markup Languages 2006, Montréal, Canada, August 2006. Observations on Annotations – Wuppertal, Germany, 21 February 2019 95