V Międzynarodowa Konferencja Naukowa Nauka o informacji (informacja naukowa) w okresie zmian Innowacyjne usługi informacyjne. Wydział Dziennikarstwa, Informacji i Bibliologii Katedra Informatologii, Uniwersytet Warszawski, Warszawa, 15 – 16 maja 2017
Mieczysław Muraszkiewicz, Warsaw University of Technology: Artificial Intelli...
Similar to Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usability” in Virtual Research Environments and Open Learning contexts
Innovation for Europeana - Europeana v2.0 WP7Max Kaiser
Similar to Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usability” in Virtual Research Environments and Open Learning contexts (20)
2. Motivation and Purpose
2Gerhard Budin
Our approach to Innovation in Information Services:
• Trends in society in general and in research &
education in particular
Globalization -> Multilingualism + Transcultural
Contexts of Work; Global Service Industry
Digitization -> E-Learning, Virtual Research
Environments (VREs), Cloud Computing, Big Data, etc.
Despite + because of all this: orientation towards the
person – user, learner, researcher, consumer, citizen
paradigms such as usability/design; cognitive
systems; knowledge organization; multilingual services
3. Our Approach is inspired by Our
Background: What we are doing
3Gerhard Budin
Centre for Translation Studies – University of Vienna
• Teaching: B.A./M.A./PhD in Translation Studies,
Transcultural Communication, Language Industry
• Specializations/Professional Profiles: translation
(domain-specific, cultural/literary, multi-media),
interpreting (conference/community i.), terminology
management, machine translation, language industry,
language technologies
• Language Coverage: de, en, fr, es, it, ro, po, pl, ru, cz, cr,
bs, sb, hu, jp, zh, ar, fa, tu, +
• 4000 students, 200 teachers, 20 researchers ->
• Research (applied and fundamental) in all those topics
listed (EU- and national funding) -> cont. next slide
4. Research Focus Nr. 1:
Translation Technologies and
Multilingual Data Processing
4Gerhard Budin
The research goal is the further development of so-
called „LT ecosystems“ (LT = language technologies)
that have emerged in recent years at international
level. Such LT ecosystems include
• (1) a diversity of tools for machine translation and
computer-assisted translation and interpreting and
for data processing in international language
industry and
• (2) a broad range of different types of language
resources (text corpora, terminology data, mono-
and multilingual, of different text types, multimodal
(speech, written, video, etc.) that are being used for
various communication purposes in society and that
are the object of investigation by translation studies
scholars.
5. Cont.
5Gerhard Budin
Related research topics are, for instance,
• (1) the modeling and representation of terminological
dynamics in different knowledge domains, of linguistic
diversity and of linguistic variation in communication;
• (2) cognitive requirements of different user groups
(usability, accessibility) to language technology tools
and to language resources;
• (3) possibilities and limitations of formalization,
automation, and optimization of translation processes
in international language industry.
• Research goals also include the further development of
new paradigms of machine translation, of the
multilingual semantic web and of the processing of
terminological data using methods of cognitive
informatics.
6. Integration of different paradigms
6Gerhard Budin
As a response to the trends mentioned above, we
combine various concepts & approaches:
• language engineering, language technologies,
computational linguistics, language industry
• big data, deep analytics, machine learning, data
mining, data value chains, data science
• cognitive informatics, cognitive science, usability
engineering, design for all
• cross-cultural cmmunication, translation &
interpreting, multilingualism
• At the core: knowledge organization systems,
multiilingual terminologies & ontologies
7. Towards a Convergence
of Concepts & Approaches
7Gerhard Budin
Language Engineering,
Language Technologies ,
Language Industry
Big Data , Deep Analytics,
Data Mining, Data Value
Chains, Data Science
Cognitive Informatics,
Cognitive Science; Usability,
Accessibility, Design
Cross-cultural
communication,
translation/nterpreting
KOS & MTO
8. Properties & Components
8Gerhard Budin
Exploiting the strengths of each approach in order to
eliminate the challenges of each (other) approach
Approach Challenges Strengths
Big Data Unstructured data, semantics,
cross-cultural differences,
implicitness, data silos, lack of
data, multilingual data
Economies of scale, speed
Cognitive Computing Human semantics & cognition Formal semantics,
explicitation
Language engineering Human semantics & cognition,
quality
Managing unstructured data,
explicitation, retrieval,
constistency, handling
multilingualism
Cross-cultural communication &
management, translation
Economies of scale, structured
data, lack of speed
Unstructured data,
semantics across cultures
and languages, explicitation
9. Properties & Components
9Gerhard Budin
Integrative approach in designing workflow
architectures and application ecosystems based on
• Interoperability of data formats and meta data
• Standardized linguistic (not only semantic)
annotation frameworks implemented in tools
• Policies (incl. legal aspects), plans, tools for data
management and data value chains
• Usability-centered design and implementation
10. Big Data Ecosystem Along
the Data Value Chain
10Gerhard Budin
EBDV-SRIA 2015, p. 36
11. + Translation workflows
11Gerhard Budin
• Text/Data harvesting, automatic corpus building
• Text (incl. speech) recognition/processing
• pre-editing, term extraction, controlled language
• Bi-text alignment, term alignment
• Segmentation, parsing
• Annotation, analysis, information/knowledge creation
• CAT workflow (using translation memories)
• MT core with interaction with LRTs
• Post-editing, revision, quality control
• Terminology management workflows
• Editing/publishing workflow
12. KOS-MTO for Translation
Workflows
12Gerhard Budin
• (automated & adaptive) language resources and
language technologies (LRT) identification, analysis, and
exploitation
• (automated & adaptive) workflow management
• (automated & adaptive) BD analytics and BD value chain
on massive scale for language data (from unstructured
to structured) -> this requires fine-tuned language data
typology with detailed (meta data) type descriptions
• Adaptive use of KOS-MTO at all stages in workflows
• Including additional value creation chains:
• data –> information –> knowledge
• implicit knowledge –> explicit knowledge
• vague semantics –> precise semantics
13. Trans-medial/modal Translation
Workflows
13Gerhard Budin
Specific approach needed for these workflows
• Content in each medium and mode presentation to be
properly treated
• Cross-type transformations properly managed and
automated and embedded in workflows
• A lot of content is multi-medial and multi-modal
• Requires cross-medial/-modal alignment
• Bi-directional transformation and alignment for
text/speech; video/audio; translation/interpreting
• Multilingual Terminological Content&Knowledge
Management (with ontologies in formal back-engine)
14. A concrete example: what
interpreters need…
14Gerhard Budin
• Extremely usable/accessible interfaces of
existing language technology applications for
cognitive support in simultaneous-collaborative
searching – data-mining on the fly
• (lots of) Language resources and language
technologies (LRT) at hand in any situation (any
time, any where) in an integrated, seamless,
accessible way (incl. adaptive & mobile
technologies, collaborative environments (cloud
& crowd computing), MT integrated in complex
trans-medial workflows in real-time
15. Language Big Data Ecosystem
15Gerhard Budin
Roles in this ecosystem:
• Translators/interpreters are both users and producers of
language data, of information and of knowledge (of
LRTs) in multiple languages for specified purposes –>
thus constantly adding language data for re-use, feeding
it into the language big data ecosystem
• LRT creators/providers increasing their accessibility
• Clients of translators/interpreters in industry, trade,
public services, different branches and domains
• Terminology, content and KOS managers
• There is no such thing (anymore) called „end users“ –>
they communicate, add language data to the ecosystem
• The challenge is to get hold of as much data as possible
16. The „Term-Ecosystem“
16Gerhard Budin
• Terminologies are conceptual-cognitive knowledge
organization and semiotic communication systems within
and across domains – with different degrees of
structurization, formalization, and explicitation
• Terminologies are collected as mono- or multilingual
lexical LRs, ideally accompanied by knowledge-rich data
(i.e. definitions, contexts, etc.), and can be transformed
into terminological ontologies (from TBX to SKOS, OWL,
etc. accompanied by semantic formalization and
explicitation)
• There are terminology producers (domain experts),
collectors/curators (terminologists), translators (users,
secondary producers, enhancers), lexicographers creating
LRs such as glossaries, term bases, dictionaries, etc.
17. UTEA - Ben Brooks & Andreas Lüdtke 2014
ISO 9241-210:2010 provides requirements and recommendations for human-centred design
principles and activities throughout the life cycle of computer-based interactive systems. It
is intended to be used by those managing design processes, and is concerned with ways in
which both hardware and software components of interactive systems can enhance
human–system interaction.
17
Usability Testing, Evaluation
& Assessment standards
18. Part II: Selected R&D projects
18Gerhard Budin
„cognitive systems“ approach, including
multilingual interfaces, multilingual content,
cross-lingual services
design for all, user-centered, accessible (based
on diversity management framework)
Personalized, cognitive design of workflows
EU-projects (H 2020, Erasmus+, Eureka)
Austrian National Research Funds (FWF, FFG)
19. Cognitive Systems –
components and requirements
Epistemological foundation: cognitive systems
conceived as adaptive socio-technical systems that are
processing information for specified purposes
Integrative approach enabling such systems to operate
– at different (semi-)automated, interpretative and transformative
levels: data – information – knowledge
– across multiply heterogeneous environments on the basis of
syntactic, semantic, and pragmatic interoperability
– as adaptive and ergonomic decision-support systems in dynamic
workflows and diverse communicative situations
20. Cognitive Systems –
components and requirements
Main components:
– Cognitive ontologies
• Conceptual representations of domain knowledge and of the
(formalized) rules and operations of knowledge processing and
management
• Multilingual concept representations including
– CROSS-CULTURAL AND CROSS-LINGUAL SEMANTIC
ASYMMETRIES
– SYNTACTIC, SEMANTIC, AND PRAGMATIC VARIABILITY
• Adaptive multi-modal displays for human use in virtual research
environments
• Enhanced automatized transformative power using cognitive
informatics methods and tools
21. Case Study 1: Cognitive System
for Global Risk Management
State-of-the-art of the project with the following
components up and running:
– A 8-language multilingual glossary for risk management
– A multilingual terminology database
– A frame-based semantic model of the risk management
terminology that has been used for a creating a hypertext
environment
– Corpus linguistic annotations on the lexical units for
computational lexical processing
– A full-text corpus comprising hundreds of texts in several
languages on the domains covered in the risk project
– A framework for a (multilingual) risk ontology
22. Current work and next steps
Component integration in a cognitive system
– based on a cognitive ontology that is designed according to
cognitive user ergonomics in diverse work situations
Introducing cognitive informatics methods
– for automated conceptual knowledge engineering, in particular
Y. Wang‘s et al. work on concept algebra, object-attribute-
relation (OAR) model and algorithms of machine concept
elicitation (AMCE)
Applying these methods on the multi-domain risk
management knowledge systems
– by using terminologies, taxonomies, thesauri, ontologies and
other knowledge organization systems (KOS) with different
degrees of formalization and by using the full text corpus for
machine learning and concept elicitation
23. A visualization of the risk scenario frame produced with Framegrapher:
https://framenet2.icsi.berkeley.edu/FrameGrapher/grapher.php
24. a part of an ontology in the risk domain generated by a tool
used in the project, the Altova system “SemanticWorks”
25. Using BabelNet for linguistic
data acquisition and
comparison/mapping
“BabelNet encodes knowledge as a labeled directed graph G = (V , E) where V is
the set of nodes – i.e., concepts such as play and named entities such as
Shakespeare – and E ⊆ V × R × V is the set of edges connecting pairs of concepts
(e.g., play is-a dramatic composition). Each edge is labeled with a semantic
relation from R, i.e., {is-a, part-of , . . . , }, where ∈ denotes an unspecified
semantic relation. Importantly, each node v ∈ V contains a set of lexicalizations
of the concept for different languages, e.g., {playen, Theaterstückde, drammait,
obraes, ... , pièce de théâtrefr}. We call such multilingually lexicalized concepts
Babel synsets. Concepts and relations in BabelNet are harvested from the largest
available semantic lexicon of English, WordNet, and a wide-coverage
collaboratively-edited encyclopedia, Wikipedia (introduced in Section 2). In
order to build the BabelNet graph, we collect at different stages:
a. From WordNet, all available word senses (as concepts) and all the lexical and semantic pointers
between synsets (as relations);
b. From Wikipedia, all encyclopedic entries (i.e., Wikipages, as concepts) and semantically
unspecified relations from hyperlinked text.
BabelNet: The automatic construction, evaluation
and application of a wide-coveragemultilingual
semantic network (Navigli/Ponzetto in Artificial
Intelligence 193 (2012) 220)
26. Case Study 2:
MOA
26Gerhard Budin
My Own Agency Connector – Connecting freelance
translators to content management systems and
statistical machine translation
• Project duration: February 2015 until December 2016
• Research area: Translation platform
• Funding body: Eurostars
• Project leader: Centre for Translation
Studies, University of Vienna
• Cooperation partner:
Nativy GmbH
27. MOA - Background
27Gerhard Budin
There is a need for a connection between
individual platforms of freelance translators and
closed content management systems of
companies/clients.
• Access to a client‘s content that should be translated
is difficult
28. MOA - Aims
28Gerhard Budin
The main aims of MOA are:
• It enables translators to collaborate and cooperate
with each other
• Translators can directly access clients as an own
translation platform
• MOA Connector brings the strength of professional
human translators into enterprise portals
• It acts as a gateway for submitting, tracking and
reviewing translation projects
29. MOA - Outputs
29Gerhard Budin
The main output is the MOA Connector:
• a platform enabling
• clients to hire professional human translations
• directly from freelance translators
• working in a network of hundreds of colleagues
• providing hundreds of language combinations
• Usability analysis – and optimization
• with better technology:
• internal machine translation system
• computer-assisted translation system
• terminological database
30. • The ACT project (ERASMUS+) defines a new
professional profile:
• Media Accessibility Expert/Manager for the Scenic Arts
• training activities for training such professionals
• Context and Motivation:
• Full participation of all citizens in cultural events as end
users or participants – must be the norm
• Equal opportunity & access to culture are HUMAN RIGHTS
Case Study 3: European Training in
Accessibility to Live Events
31. Specific objectives (1)
ACT strengthens cooperation between organisations in
different but complementary sectors with a view to
establishing exchanges of practices.
ACT has cooperation with regional authorities and the
integration in actions of local and regional
development, with an emphasis on scenic arts.
ACT develops curricula to current and emerging labour
market needs, by promoting active cooperation
between HEI and partners from outside academia.
31
32. Specific objectives (2)
ACT triggers development, testing and
implementation of innovative practices in the
field of education, while better preparing the
education and training professionals for equity,
diversity and inclusion challenges.
ACT fosters recognition and validation of knowledge,
skills and competences acquired through
various types of learning, by developing
innovative certification methods.
32
35. Case Study 4: Language
Technology Observatory
(LT_Observatory)
• Horizon 2020, Coordination and Support
Action
• Duration: 2015-2016
• http://lt-observe.eu
35
36. Language Technology
Observatory
(LT_Observatory)
Objectives:
• Identification of language resources in existing pools
based on pre-defined user needs
• Bring together different stakeholders from the language
community
• Identify national language strategies and funding
sources to realize the Digital Single Market
• Create the on-line LT Observatory as a sustainable
structure for access to the LT ecosystem
36
37. LT_Observatory:
3 services for the LT community:
1) A Catalogue of selected Language Resources
that meet the requirements of machine
translation professionals in an operational
context: accessibility, openness or clear
licensing scheme, domain coverage, Dublin
Core metadata.
37
38. 2) Public Policy Observatory that provides
information about language technology
policies of Member States, and investigated
funding opportunities at EU, national and
regional level, with practical information about
contacts and open calls.
38
LT_Observatory:
3 services for the LT community:
39. LT_Observatory: 3 services
for the LT community:
3) LT-Observe provides ongoing coverage of the
language technology markets and policy
issues through three news channels: LangTech
News, CITIA News, LangPol News.
39
40. Operational Usability for Language
Resources for Machine Translation
for the Digital Single Market
40Gerhard Budin
Purpose: to unlock the value of existing resources by
increasing accessibility and practical usability
• Gold standard by selected „ideal“ resources
• Critical selection criteria (size, speed/ease of access ,
domain relevance, language range, processing costs,
speed of implementation, etc.)
• Practical set of metrics to evaluate LRs
• Different (usability) levels of compliance
• To be used as a basis for open-sourcing
• For upgrading LRs of lower usability levels
• LR alignment for multilingual expansion
• Start projects to do all this
• -> LR infrastructure on the basis of operational usability
41. Case Study 5: VRE collaborative
linguistics research
A Digital Humanities project: designing and building a
virtual research environment (VRE)for linguistic studies,
i.e. for the study of the use of the German language in
Austria (DiÖ) with a focus on variation, perception and
dynamic change
Distributed project teams working for 4 years on 11
different sub-projects and dozens of sub-tasks (about 100
persons involved in different researcher roles)
Functional model in form of a cognitive ontology: data
(object) types, attributes, and their operations
Cognitive ergonomics and functional study: what do
researchers need in which workflows
42. Collaborative Online Research
Platform “German in Austria”
• FWF, Special Research Programme (SFB)
• Duration: 2016-2019
• Partners: University of Vienna, University of
Graz, University of Salzburg
• http://dioe.at
42
43. Collaborative Online Research
Platform “German in Austria”
Aims:
• Support researchers throughout the whole
research cycle;
• Sustainable preservation of research data
and outcomes
43
44. Collaborative Online Research
Platform “German in Austria”
Outcomes:
• Collaborative research platform;
• Collaborative annotation framework;
• Interoperability of annotation schemes,
corpora, annotated data, and workflows in
collaborative research.
44
45. Collaborative Platform: “German
in Austria”
45
data
DiÖ-data
dissemination
material
publications
transcription
annotation
analysis
peer discussion
collaborative
corpus building
collaborative
writing
analysis
repository
analysis
resources
49. t27 Heuer ADV heuer
t28 war VAFIN sein
t29 ich PPER ich
t30 fischn ADJD <unknown>
t31 in APPR in
t32 Malnitz NE <unknown>
t33 . $. .
t34 Mhm NN <unknown>
t35 Eine ART ein
t36 Woche NN Woche
t37 , $, ,
t38 Weu ADJD <unknown>
t39 i ADJD <unknown>
t40 bin VAFIN sein
t41 auch ADV auch
t42 Fliegenfischer NN <unknown>
- <div>
- <u who="#SPK1">
<anchor synch="#T37"/>
Oh Gott, o gott. Heuerwar
ich fischn in Mallnitz.
<anchor synch="#T38"/>
</u>
</div>
- <div>
- <u who="#SPK2">
<anchor synch="#T37"/>
PP
<anchor synch="#T38"/>
</u>
</div>
- <div>
- <u who="#SPK0">
<anchor synch="#T39"/>
Mhm
<anchor synch="#T40"/>
</u>
</div>
Data Processing
49
repository
Digitalisation of
material
harmonisation in line
with TEI (Text Encoding
Initiative) standards
adaptation of the tag set
and training of the POS-
tagger on ‘German in
Austria’, especially spoken
variety
data conversion
into XML
POS tagging, lemmatizing
storing,
archiving,
publishing,
reusing
Meta-data &
workflow
management
Data ontology ID-REF
50. Current work and next steps
Developing the cognitive ontology also based on and
applying the computational cognitive linguistics framework
by Wang and Berwick (2012) for semi-automatic operations,
including
– Ontology of linguistic data objects
– Ontology of data processing operations (and rules)
– Ontology of (interpretative) research operations
– Ontology of collaborative workflow operations
VRE 1.0 up and running since January 2017 with continuous
expansions and improvements until end of 2019
Usability testing and evaluation ongoing in our usability lab
for continuous improvement of workflows, interfaces,
functionalities to ensure operational usability of the VRE
51. Case study 6: Open Discovery Space
51Gerhard Budin
FP 7 Integrated Project 2012-2016
With 50 partners all over Europe
ODS: OPEN DISCOVERY SPACE VISION
Open Discovery Space is an Open Innovation platform for K-
12 teachers facilitating educational content creation, sharing
and retrieval of Open Education Resources, as well as
networking and collaborative space among teachers, parents,
content and technology providers and policy makers.
ODS Community portal provides a socially powered and
multilingual Open Learning infrastructure to boost adoption
of eLearning resources.
52.
53. Didactic design of multilingual
E-Learning content
53Gerhard Budin
Various steps
• Learning ontologies
• Learning paths
• Learning content modeling & packaging
• Learner modeling
• Didactic modeling
• Curriculum design
• E-Learning system design/adaptation
• Adaptive design for different types of devices,
interfaces
• Cross-cultural issues to be taken into account –
different learning cultures
54. Conclusions and Outlook
54Gerhard Budin
Accessibility – this concept has become a powerful strategic agenda
in legislation and best practice for the whole spectrum of
dimensions of diversity management (not only for persons with
physical or mental impairments or challenges, but also for all other
diversity dimensions (age, level of education, linguistic identiy,
ethnic identity, religious beliefs, sexual orientations, etc.)
Operational Usability – has also become a powerful concept and is
pro-actively being used in policy development at EU level and
national levels, and has become a topic of R&D (as accessibility)
Digital Humanities Research Infrastructures also transform library
services, e.g. „library labs“ designed as research labs and learning
labs, where operational usability and accessibility are already part
of the design process
-> a source of Innovation for Information Services