SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Developing effective terminological
resources for commercial use
or
Narrowing the gap between
termbases and corpora
in commercial environments

Kara Warburton
CHAT 2013

© 2013 by Termologic. All rights reserved.

1
1. Motivation of my research
2. Does commercial language contain terminology?
3. What is a term?
4. Our assumption about termbases and corpora
5. Aim and methodology of our research
6. Corpus-valid terms
7. The gap between termbases and corpora
8. Causes of the gap
9. Keywords and their potential
10.Conclusions
© 2013 by Termologic. All rights reserved.

2
Personal motivation
●

The established principles and methodologies of
terminology management don't seem to “fit” the needs of
commercial uses of terminology

How do I resolve this apparent conflict?
●

●

Study how terminology is managed in commercial settings
Identify key issues, gaps with mainstream methodology
and theory

© 2013 by Termologic. All rights reserved.

3
Mainstream theory and
practice

Commercial needs

Strict ties to translation
Restricted focus of termbases
Normative
Onomasiological
Thematic
Univocal
Objectivist, concept focus
Philosophical, social concern

Polyvalent

© 2013 by Termologic. All rights reserved.

Prescriptive and descriptive
Largely semasiological
Ad-hoc
Multivocal
Communicative, language focus
Commercial concern

4
What is terminology?
●

●

●

●

(Terminology is) the science studying the structure,
formation, development, usage and management of
terminologies in various subject fields
(A terminology is a) set of designations belonging to one
special language.
(A special language is) a language used in a subject field
and characterized by the use of specific linguistic means of
expression.
(A subject field is) a field of special knowledge.

(ISO 1087-1, 2000)
© 2013 by Termologic. All rights reserved.

5
According to these definitions
●

An LSP (special language) contains terminology.

●

Key criteria for LSP:
–
–

●

Subject field
Specific linguistic means of expression

Therefore:
–

If commercial language is an LSP, then it contains
terminology.

–

Commercial language is an LSP if it:
●
●

can be viewed as a type of subject field
has specific linguistic means of expression

© 2013 by Termologic. All rights reserved.

6
What is a subject field?
What is “special” knowledge?
●

●

●

Pure and applied sciences, techniques, technologies,
specialized activities
Professional activities carried out in business, industry,
companies, and professional settings
Any specialized activity carried out by humans

© 2013 by Termologic. All rights reserved.

7
What are “specific linguistic means of expression”?
●

Textual characteristics
–

concision, precision, depersonalisation, economy,
referentiality, preponderance of nominal structures,
dominance of written form

●

Communicative situation: formal, professional

●

Communicative purpose
–
–

●

inform, educate
objective, precise, concise, and unambiguous exchange of
information

Conscious acquisition

© 2013 by Termologic. All rights reserved.

8
Commercial language is an LSP
●

●

Describes tangible products, services, and activities, often
within one vertical industrial or economic sector, which
could be viewed as a subject field
Adheres to specific linguistic rules and styles; many
companies have a style guide, some are automatically
implementing the style rules through controlled authoring
software

●

Written form predominates

●

Informative purpose

© 2013 by Termologic. All rights reserved.

9
What is a term?
●

●

●

●

●

General theory: the designation of an object the conceptualization of which can be classified into a system of concepts
Socio-cognitive theory: a natural language representation of a
unit of understanding, considered relevant to given purposes,
applications, or groups of users
Lexico-semantic theory: a construct that takes shape through an
analysis which gives consideration to corpus evidence, subjectmatter relevance, and the purpose of the terminographical
product
Textual theory: a semantically-charged linear structure that
contributes to texture (coherence and cohesion) in an LSP text
Communicative theory: all the above
© 2013 by Termologic. All rights reserved.

10
What is a term for commercial terminography?
●

●

●

●

Semantic membership in a subject field is a guiding
criterion
But bringing benefit to the company is the primary criterion
Companies have diverse needs, requiring diverse types of
terminological resources
......

© 2013 by Termologic. All rights reserved.

11
Applications of terminology
●

computer-assisted translation

●

controlled authoring

●

content management, automatic content classification

●

product classification

●

indexing, SEO, keyword management, etc....

EACH of these applications requires a HIGH LEVEL of
correspondence between the termbase and the company
corpus.
© 2013 by Termologic. All rights reserved.

12
A term is...
●

ANY lexical unit that can bring benefit to the company by
being “managed” is a candidate “term”. This MAY include:
–

General lexicon words

–

Phrases

–

TM segments

–

Proper nouns

–

Variants

–

Non-nouns, especially verbs

© 2013 by Termologic. All rights reserved.

13
Aim of our research
●

Compare termbases and corpora in four IT
companies to see how well they (the terms)
correspond

●

Establish the scope of the gap

●

Explain the gap

●

Identify ways to reduce the gap

© 2013 by Termologic. All rights reserved.

14
Methodology of the research
●

Obtain termbases in export files from 4 different systems

●

Convert to TBX

●

Import to TermWeb

●

Apply necessary filters for different evaluations

●

Obtain and prepare corpora for analysis

●

Export corpus-valid terms from termbase

●

Run batch concordance of termbase terms

●

Statistically analyze results

●

Identify patterns, investigate solutions, including keywords
and DICE ranking

© 2013 by Termologic. All rights reserved.

15
Profile of the companies
●

HQ in USA but global presence, all in IT sector

●

Company A
–
–

330 employees

–

●

Field: statistics
Across language server, CrossDesk, CrossAuthor, xMetal,
CrossTerm

Company B
–

Field: business analytics

–

13,000 employees

–

Acrolinx IQ,in-house CAT tools, TermWeb

© 2013 by Termologic. All rights reserved.

16
Profile of the companies
●

Company C
–
–

18,500 employees

–
●

Field: information security, storage, management
SDL WorldServer, Acrolinx IQ

Company D
–

Field: hardware (PCs, servers, printers, networking),
software

–

330,000 employees

–

SDL Trados and MultiTerm

© 2013 by Termologic. All rights reserved.

17
Size of data
Corpus
size in
tokens

Terms from
termbase

Size of
corpus in
relation to
termbase

1

3,973,265

1,777

2,236

2

19,808,928

6,441

3,075

3

22,136,564

4,195

3,074

4

400,777

4,385

91

© 2013 by Termologic. All rights reserved.

18
The gap between termbases and corpora

Range 0 + A:

Company A
Company B
Company C
Company D

© 2013 by Termologic. All rights reserved.

35%
63%
73%
76%

19
Causes of the gap
A) Under-performing termbase terms
–

termbase terms that are absent or are infrequent in the
corpus (generally, redundant terms)

B) Under-documented corpus terms
–

corpus terms that are either entirely missing from the
termbase (nonextant terms) or are in insufficient
number in the termbase (infrequent terms).

© 2013 by Termologic. All rights reserved.

20
Under-performing termbase terms
●

Upper-case terms

●

Excessively long terms

●

TM segments

●

●

Terms with unessential modifiers (boundary setting
problem)
Terms with proper name modifiers

© 2013 by Termologic. All rights reserved.

21
Term boundary problems
Nonextant term
bad cluster
automatic incremental backup

Adjusted term
cluster
incremental backup

sequential mean squares
absolute correlation coefficient
individual fitted values
active data source
critical success factor component
printhead failure

mean squares
correlation coefficient
fitted value
data source
critical success factor
printhead

© 2013 by Termologic. All rights reserved.

Frequency
8,490
521
129
330
270
7,201
540
275

22
The cost of redundant terms
●

●

●

IT cost
Reduced efficiency due to dilution of “good”
entries
Cost of creating and maintaining the entries

© 2013 by Termologic. All rights reserved.

23
Under-documented corpus terms
●

Variants

●

Non-nouns (particularly verbs)

●

Homographs

●

Terms with optimally-set boundaries

●

Multi-word terms containing a keyword

●

Adjectives that are productive in forming MWTs

© 2013 by Termologic. All rights reserved.

24
Keywords
●

●

●

A word that is unusually frequent, therefore, likely a
domain-specific unigram term
Determined by comparing word-frequency lists from a
domain corpus and a general purpose (reference) corpus
Good indicators of the key topics of a corpus

© 2013 by Termologic. All rights reserved.

25
Keyword categorization
●

High-ranking - highly domain-specific
–

●

Mid- and Low-ranking - potential for domain-specific
homographs
–

●

data, plot, syntax, command, string, server

worm, cloud, wizard, key, boot

Keywords that are absent from or are extremely rare in the
reference corpus - less frequent but also highly domainspecific
–

dotplot, ODBC, toolbar, widget, spyware, phishing

© 2013 by Termologic. All rights reserved.

26
Keywords as nodes of MWTs
●

●

●

Highly productive in
forming domain-specific
multi-word terms (MWTs)
Have been successfully
leveraged in term
extraction research
Successful search
techniques include raw
collocate frequency and
DICE collocate relationship
measure

© 2013 by Termologic. All rights reserved.

27
Verbs
assess
overlay
remove
replicate
save
fail
choose
enter

display
customize
gage
delete
calculate
censor
edit
forecast
plot

© 2013 by Termologic. All rights reserved.

specify
select
create
click
plot
display
access
test

run
return
enable
update
compute
tick
delete
design

28
Key findings
●

Unigrams and bigrams make up the vast majority of
termbase terms that occur frequently in the corpus.

●

Terms that present the situation of homonymy are
important to document in a termbase

© 2013 by Termologic. All rights reserved.

29
Key findings
●

Verbs and adjectives are under-documented in termbases

●

Termbases are underoptimized when it comes to
documenting frequent domain-specific terms. Only three to
eight percent of the termbase terms occur very frequently,
and only 13 to 17 percent of their termbase terms occur
frequently. Only one company managed to include a
moderate level of frequent terms in its termbase (37
percent).
© 2013 by Termologic. All rights reserved.

30
Conclusion
●

For commercial terminography, the notion of termhood
needs to take into account not only the traditional semantic
criteria but also pragmatic and purpose-driven criteria.

●

Terminography serving commercial purposes needs to be
more corpus-driven.

●

This is not “terminology” in the traditional sense. It includes
various types of lexical resources.

© 2013 by Termologic. All rights reserved.

31
Backup slides

© 2013 by Termologic. All rights reserved.

32
In-house pracices
●

●

●

●

●

All 4 companies are interested in using terms in controlled
authoring (CA); 3 companies are doing so already.
Only one company (A) maintains all its data in a single
termbase. The other companies maintain separate
termbases for various purposes, such as CA and CAT.
Company D has 15 termbases.
Company B maintains 3 separate termbases: CA, CAT,
and Authoring/Publishing aid.
Company C uses automatic term extraction.
Company D imports TM segments into the termbase to
compensate for technical limitations of TM matching.
© 2013 by Termologic. All rights reserved.

33
Most common termbase data categories
●

Definition

●

Part of speech

●

Process status

●

Usage status

●

Term type

© 2013 by Termologic. All rights reserved.

34
Most common termbase problems
●

●

●

None of the companies use subject fields
Only 2 companies consistently mark the part-ofspeech
There are widespread violations of:
–

Term autonomy

–

Concept orientation

–

Data elementarity/granularity

© 2013 by Termologic. All rights reserved.

35
Corpus-valid termbase terms
●

●

●

●

●

A notion defined strictly for the purposes of our research
Terms that we “count” to measure of the gap between the
termbase and the corpus
Terms in the termbase that can reasonably be expected to
occur in the corpus
Does not include terms with negative usage markers (do
not use, deprecated, etc.)
Does not include general lexicon words, due to application
specificity for controlled authoring, reduced terminological
“interest”, and high expected number of concordances
© 2013 by Termologic. All rights reserved.

36
Example of a filter for corpus-valid terms

© 2013 by Termologic. All rights reserved.

37
●

Termbase verbs occur 26,000 times

●

Keyword verbs occur 90,000 times

© 2013 by Termologic. All rights reserved.

38

Más contenido relacionado

Destacado

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...TAUS - The Language Data Network
 
9 February 2011, TAUS Round Table, Amsterdam, Rahzeb Choudhury
9 February 2011, TAUS Round Table, Amsterdam, Rahzeb Choudhury9 February 2011, TAUS Round Table, Amsterdam, Rahzeb Choudhury
9 February 2011, TAUS Round Table, Amsterdam, Rahzeb ChoudhuryTAUS - The Language Data Network
 
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS - The Language Data Network
 
TAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, Symantec
TAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, SymantecTAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, Symantec
TAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, SymantecTAUS - The Language Data Network
 
TAUS MT SHOWCASE, Is the Translation Industry Ready, Jaap van der Meer, TAUS...
TAUS MT SHOWCASE,  Is the Translation Industry Ready, Jaap van der Meer, TAUS...TAUS MT SHOWCASE,  Is the Translation Industry Ready, Jaap van der Meer, TAUS...
TAUS MT SHOWCASE, Is the Translation Industry Ready, Jaap van der Meer, TAUS...TAUS - The Language Data Network
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS - The Language Data Network
 
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013Multilingual Term Extraction as a Service from Acrolinx, CHAT2013
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013TAUS - The Language Data Network
 
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...TAUS - The Language Data Network
 
MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...
MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...
MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...TAUS - The Language Data Network
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS - The Language Data Network
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...TAUS - The Language Data Network
 
Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...
Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...
Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...TAUS - The Language Data Network
 

Destacado (13)

Reverso Context by Théo Hoffenberg (Reverso)
Reverso Context by Théo Hoffenberg (Reverso)Reverso Context by Théo Hoffenberg (Reverso)
Reverso Context by Théo Hoffenberg (Reverso)
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Serge Gladhoff, Logrus...
 
9 February 2011, TAUS Round Table, Amsterdam, Rahzeb Choudhury
9 February 2011, TAUS Round Table, Amsterdam, Rahzeb Choudhury9 February 2011, TAUS Round Table, Amsterdam, Rahzeb Choudhury
9 February 2011, TAUS Round Table, Amsterdam, Rahzeb Choudhury
 
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
 
TAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, Symantec
TAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, SymantecTAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, Symantec
TAUS USER CONFERENCE 2010, Collaborate to innovate - Fred Hollowood, Symantec
 
TAUS MT SHOWCASE, Is the Translation Industry Ready, Jaap van der Meer, TAUS...
TAUS MT SHOWCASE,  Is the Translation Industry Ready, Jaap van der Meer, TAUS...TAUS MT SHOWCASE,  Is the Translation Industry Ready, Jaap van der Meer, TAUS...
TAUS MT SHOWCASE, Is the Translation Industry Ready, Jaap van der Meer, TAUS...
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
 
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013Multilingual Term Extraction as a Service from Acrolinx, CHAT2013
Multilingual Term Extraction as a Service from Acrolinx, CHAT2013
 
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
 
MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...
MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...
MMT - Next Generation Machine Translation by Marcello Federico (Fondazione Br...
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Sándor Sojnóczky, Hunne...
 
Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...
Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...
Enhancing the Quality of Terminological Resources: the LISE Tools and Platfor...
 

Similar a Closing the Gap between Corpora and Termbases, CHAT2013

Terminology Management Best Practices
Terminology Management Best PracticesTerminology Management Best Practices
Terminology Management Best PracticesSDL
 
Post-Editing of Machine Translation: Developing Requirements and Compensation...
Post-Editing of Machine Translation: Developing Requirements and Compensation...Post-Editing of Machine Translation: Developing Requirements and Compensation...
Post-Editing of Machine Translation: Developing Requirements and Compensation...Luigi Muzii
 
Seven components of content strategy global swisher
Seven components of content strategy global swisherSeven components of content strategy global swisher
Seven components of content strategy global swisherVal Swisher
 
The Seven Components of a Global Content Strategy
The Seven Components of a Global Content StrategyThe Seven Components of a Global Content Strategy
The Seven Components of a Global Content StrategyContent Rules, Inc.
 
The importance of terminology
The importance of terminologyThe importance of terminology
The importance of terminologySDL Trados
 
Business writing
Business writingBusiness writing
Business writingPaul Robere
 
Tm challenges
Tm challengesTm challenges
Tm challengesITIRussia
 
Terminology management as fitness v.2 iti
Terminology management as fitness v.2 itiTerminology management as fitness v.2 iti
Terminology management as fitness v.2 itiITIRussia
 
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementSelecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementHeather Hedden
 
A comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org StrategyA comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org StrategyGaytri khandelwal
 
Business writing
Business writingBusiness writing
Business writingPaul Robere
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction documentrajatkr
 
Jarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing OntologiesJarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing OntologiesMustafa Jarrar
 
4 - Overview of Generative AI Session#4.pptx
4 - Overview of Generative AI Session#4.pptx4 - Overview of Generative AI Session#4.pptx
4 - Overview of Generative AI Session#4.pptxSumathy
 
Understanding the semantics landscape
Understanding the semantics landscapeUnderstanding the semantics landscape
Understanding the semantics landscapeMikeHypercube
 
Are You Being Mauled by Your Terminology?
Are You Being Mauled by Your Terminology?Are You Being Mauled by Your Terminology?
Are You Being Mauled by Your Terminology?LavaCon
 
Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)Joe Gollner
 
Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...
Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...
Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...Global Business Events
 
Technical Report Lagos (2nd run)
Technical Report Lagos (2nd run)Technical Report Lagos (2nd run)
Technical Report Lagos (2nd run)Ifeoma Onyemachi
 
Technical Report Writing, Lagos (2nd run)
Technical Report Writing, Lagos (2nd run)Technical Report Writing, Lagos (2nd run)
Technical Report Writing, Lagos (2nd run)Ifeoma Onyemachi
 

Similar a Closing the Gap between Corpora and Termbases, CHAT2013 (20)

Terminology Management Best Practices
Terminology Management Best PracticesTerminology Management Best Practices
Terminology Management Best Practices
 
Post-Editing of Machine Translation: Developing Requirements and Compensation...
Post-Editing of Machine Translation: Developing Requirements and Compensation...Post-Editing of Machine Translation: Developing Requirements and Compensation...
Post-Editing of Machine Translation: Developing Requirements and Compensation...
 
Seven components of content strategy global swisher
Seven components of content strategy global swisherSeven components of content strategy global swisher
Seven components of content strategy global swisher
 
The Seven Components of a Global Content Strategy
The Seven Components of a Global Content StrategyThe Seven Components of a Global Content Strategy
The Seven Components of a Global Content Strategy
 
The importance of terminology
The importance of terminologyThe importance of terminology
The importance of terminology
 
Business writing
Business writingBusiness writing
Business writing
 
Tm challenges
Tm challengesTm challenges
Tm challenges
 
Terminology management as fitness v.2 iti
Terminology management as fitness v.2 itiTerminology management as fitness v.2 iti
Terminology management as fitness v.2 iti
 
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementSelecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology Management
 
A comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org StrategyA comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org Strategy
 
Business writing
Business writingBusiness writing
Business writing
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
 
Jarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing OntologiesJarrar: Stepwise Methodologies for Developing Ontologies
Jarrar: Stepwise Methodologies for Developing Ontologies
 
4 - Overview of Generative AI Session#4.pptx
4 - Overview of Generative AI Session#4.pptx4 - Overview of Generative AI Session#4.pptx
4 - Overview of Generative AI Session#4.pptx
 
Understanding the semantics landscape
Understanding the semantics landscapeUnderstanding the semantics landscape
Understanding the semantics landscape
 
Are You Being Mauled by Your Terminology?
Are You Being Mauled by Your Terminology?Are You Being Mauled by Your Terminology?
Are You Being Mauled by Your Terminology?
 
Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)Coordinating Markup Projects (CALS Expo 1995)
Coordinating Markup Projects (CALS Expo 1995)
 
Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...
Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...
Jos Teunissen, Head of Mergers & Acquisitions Global Facility Management EMEA...
 
Technical Report Lagos (2nd run)
Technical Report Lagos (2nd run)Technical Report Lagos (2nd run)
Technical Report Lagos (2nd run)
 
Technical Report Writing, Lagos (2nd run)
Technical Report Writing, Lagos (2nd run)Technical Report Writing, Lagos (2nd run)
Technical Report Writing, Lagos (2nd run)
 

Más de TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 

Más de TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Último

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Último (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Closing the Gap between Corpora and Termbases, CHAT2013

  • 1. Developing effective terminological resources for commercial use or Narrowing the gap between termbases and corpora in commercial environments Kara Warburton CHAT 2013 © 2013 by Termologic. All rights reserved. 1
  • 2. 1. Motivation of my research 2. Does commercial language contain terminology? 3. What is a term? 4. Our assumption about termbases and corpora 5. Aim and methodology of our research 6. Corpus-valid terms 7. The gap between termbases and corpora 8. Causes of the gap 9. Keywords and their potential 10.Conclusions © 2013 by Termologic. All rights reserved. 2
  • 3. Personal motivation ● The established principles and methodologies of terminology management don't seem to “fit” the needs of commercial uses of terminology How do I resolve this apparent conflict? ● ● Study how terminology is managed in commercial settings Identify key issues, gaps with mainstream methodology and theory © 2013 by Termologic. All rights reserved. 3
  • 4. Mainstream theory and practice Commercial needs Strict ties to translation Restricted focus of termbases Normative Onomasiological Thematic Univocal Objectivist, concept focus Philosophical, social concern Polyvalent © 2013 by Termologic. All rights reserved. Prescriptive and descriptive Largely semasiological Ad-hoc Multivocal Communicative, language focus Commercial concern 4
  • 5. What is terminology? ● ● ● ● (Terminology is) the science studying the structure, formation, development, usage and management of terminologies in various subject fields (A terminology is a) set of designations belonging to one special language. (A special language is) a language used in a subject field and characterized by the use of specific linguistic means of expression. (A subject field is) a field of special knowledge. (ISO 1087-1, 2000) © 2013 by Termologic. All rights reserved. 5
  • 6. According to these definitions ● An LSP (special language) contains terminology. ● Key criteria for LSP: – – ● Subject field Specific linguistic means of expression Therefore: – If commercial language is an LSP, then it contains terminology. – Commercial language is an LSP if it: ● ● can be viewed as a type of subject field has specific linguistic means of expression © 2013 by Termologic. All rights reserved. 6
  • 7. What is a subject field? What is “special” knowledge? ● ● ● Pure and applied sciences, techniques, technologies, specialized activities Professional activities carried out in business, industry, companies, and professional settings Any specialized activity carried out by humans © 2013 by Termologic. All rights reserved. 7
  • 8. What are “specific linguistic means of expression”? ● Textual characteristics – concision, precision, depersonalisation, economy, referentiality, preponderance of nominal structures, dominance of written form ● Communicative situation: formal, professional ● Communicative purpose – – ● inform, educate objective, precise, concise, and unambiguous exchange of information Conscious acquisition © 2013 by Termologic. All rights reserved. 8
  • 9. Commercial language is an LSP ● ● Describes tangible products, services, and activities, often within one vertical industrial or economic sector, which could be viewed as a subject field Adheres to specific linguistic rules and styles; many companies have a style guide, some are automatically implementing the style rules through controlled authoring software ● Written form predominates ● Informative purpose © 2013 by Termologic. All rights reserved. 9
  • 10. What is a term? ● ● ● ● ● General theory: the designation of an object the conceptualization of which can be classified into a system of concepts Socio-cognitive theory: a natural language representation of a unit of understanding, considered relevant to given purposes, applications, or groups of users Lexico-semantic theory: a construct that takes shape through an analysis which gives consideration to corpus evidence, subjectmatter relevance, and the purpose of the terminographical product Textual theory: a semantically-charged linear structure that contributes to texture (coherence and cohesion) in an LSP text Communicative theory: all the above © 2013 by Termologic. All rights reserved. 10
  • 11. What is a term for commercial terminography? ● ● ● ● Semantic membership in a subject field is a guiding criterion But bringing benefit to the company is the primary criterion Companies have diverse needs, requiring diverse types of terminological resources ...... © 2013 by Termologic. All rights reserved. 11
  • 12. Applications of terminology ● computer-assisted translation ● controlled authoring ● content management, automatic content classification ● product classification ● indexing, SEO, keyword management, etc.... EACH of these applications requires a HIGH LEVEL of correspondence between the termbase and the company corpus. © 2013 by Termologic. All rights reserved. 12
  • 13. A term is... ● ANY lexical unit that can bring benefit to the company by being “managed” is a candidate “term”. This MAY include: – General lexicon words – Phrases – TM segments – Proper nouns – Variants – Non-nouns, especially verbs © 2013 by Termologic. All rights reserved. 13
  • 14. Aim of our research ● Compare termbases and corpora in four IT companies to see how well they (the terms) correspond ● Establish the scope of the gap ● Explain the gap ● Identify ways to reduce the gap © 2013 by Termologic. All rights reserved. 14
  • 15. Methodology of the research ● Obtain termbases in export files from 4 different systems ● Convert to TBX ● Import to TermWeb ● Apply necessary filters for different evaluations ● Obtain and prepare corpora for analysis ● Export corpus-valid terms from termbase ● Run batch concordance of termbase terms ● Statistically analyze results ● Identify patterns, investigate solutions, including keywords and DICE ranking © 2013 by Termologic. All rights reserved. 15
  • 16. Profile of the companies ● HQ in USA but global presence, all in IT sector ● Company A – – 330 employees – ● Field: statistics Across language server, CrossDesk, CrossAuthor, xMetal, CrossTerm Company B – Field: business analytics – 13,000 employees – Acrolinx IQ,in-house CAT tools, TermWeb © 2013 by Termologic. All rights reserved. 16
  • 17. Profile of the companies ● Company C – – 18,500 employees – ● Field: information security, storage, management SDL WorldServer, Acrolinx IQ Company D – Field: hardware (PCs, servers, printers, networking), software – 330,000 employees – SDL Trados and MultiTerm © 2013 by Termologic. All rights reserved. 17
  • 18. Size of data Corpus size in tokens Terms from termbase Size of corpus in relation to termbase 1 3,973,265 1,777 2,236 2 19,808,928 6,441 3,075 3 22,136,564 4,195 3,074 4 400,777 4,385 91 © 2013 by Termologic. All rights reserved. 18
  • 19. The gap between termbases and corpora Range 0 + A: Company A Company B Company C Company D © 2013 by Termologic. All rights reserved. 35% 63% 73% 76% 19
  • 20. Causes of the gap A) Under-performing termbase terms – termbase terms that are absent or are infrequent in the corpus (generally, redundant terms) B) Under-documented corpus terms – corpus terms that are either entirely missing from the termbase (nonextant terms) or are in insufficient number in the termbase (infrequent terms). © 2013 by Termologic. All rights reserved. 20
  • 21. Under-performing termbase terms ● Upper-case terms ● Excessively long terms ● TM segments ● ● Terms with unessential modifiers (boundary setting problem) Terms with proper name modifiers © 2013 by Termologic. All rights reserved. 21
  • 22. Term boundary problems Nonextant term bad cluster automatic incremental backup Adjusted term cluster incremental backup sequential mean squares absolute correlation coefficient individual fitted values active data source critical success factor component printhead failure mean squares correlation coefficient fitted value data source critical success factor printhead © 2013 by Termologic. All rights reserved. Frequency 8,490 521 129 330 270 7,201 540 275 22
  • 23. The cost of redundant terms ● ● ● IT cost Reduced efficiency due to dilution of “good” entries Cost of creating and maintaining the entries © 2013 by Termologic. All rights reserved. 23
  • 24. Under-documented corpus terms ● Variants ● Non-nouns (particularly verbs) ● Homographs ● Terms with optimally-set boundaries ● Multi-word terms containing a keyword ● Adjectives that are productive in forming MWTs © 2013 by Termologic. All rights reserved. 24
  • 25. Keywords ● ● ● A word that is unusually frequent, therefore, likely a domain-specific unigram term Determined by comparing word-frequency lists from a domain corpus and a general purpose (reference) corpus Good indicators of the key topics of a corpus © 2013 by Termologic. All rights reserved. 25
  • 26. Keyword categorization ● High-ranking - highly domain-specific – ● Mid- and Low-ranking - potential for domain-specific homographs – ● data, plot, syntax, command, string, server worm, cloud, wizard, key, boot Keywords that are absent from or are extremely rare in the reference corpus - less frequent but also highly domainspecific – dotplot, ODBC, toolbar, widget, spyware, phishing © 2013 by Termologic. All rights reserved. 26
  • 27. Keywords as nodes of MWTs ● ● ● Highly productive in forming domain-specific multi-word terms (MWTs) Have been successfully leveraged in term extraction research Successful search techniques include raw collocate frequency and DICE collocate relationship measure © 2013 by Termologic. All rights reserved. 27
  • 28. Verbs assess overlay remove replicate save fail choose enter display customize gage delete calculate censor edit forecast plot © 2013 by Termologic. All rights reserved. specify select create click plot display access test run return enable update compute tick delete design 28
  • 29. Key findings ● Unigrams and bigrams make up the vast majority of termbase terms that occur frequently in the corpus. ● Terms that present the situation of homonymy are important to document in a termbase © 2013 by Termologic. All rights reserved. 29
  • 30. Key findings ● Verbs and adjectives are under-documented in termbases ● Termbases are underoptimized when it comes to documenting frequent domain-specific terms. Only three to eight percent of the termbase terms occur very frequently, and only 13 to 17 percent of their termbase terms occur frequently. Only one company managed to include a moderate level of frequent terms in its termbase (37 percent). © 2013 by Termologic. All rights reserved. 30
  • 31. Conclusion ● For commercial terminography, the notion of termhood needs to take into account not only the traditional semantic criteria but also pragmatic and purpose-driven criteria. ● Terminography serving commercial purposes needs to be more corpus-driven. ● This is not “terminology” in the traditional sense. It includes various types of lexical resources. © 2013 by Termologic. All rights reserved. 31
  • 32. Backup slides © 2013 by Termologic. All rights reserved. 32
  • 33. In-house pracices ● ● ● ● ● All 4 companies are interested in using terms in controlled authoring (CA); 3 companies are doing so already. Only one company (A) maintains all its data in a single termbase. The other companies maintain separate termbases for various purposes, such as CA and CAT. Company D has 15 termbases. Company B maintains 3 separate termbases: CA, CAT, and Authoring/Publishing aid. Company C uses automatic term extraction. Company D imports TM segments into the termbase to compensate for technical limitations of TM matching. © 2013 by Termologic. All rights reserved. 33
  • 34. Most common termbase data categories ● Definition ● Part of speech ● Process status ● Usage status ● Term type © 2013 by Termologic. All rights reserved. 34
  • 35. Most common termbase problems ● ● ● None of the companies use subject fields Only 2 companies consistently mark the part-ofspeech There are widespread violations of: – Term autonomy – Concept orientation – Data elementarity/granularity © 2013 by Termologic. All rights reserved. 35
  • 36. Corpus-valid termbase terms ● ● ● ● ● A notion defined strictly for the purposes of our research Terms that we “count” to measure of the gap between the termbase and the corpus Terms in the termbase that can reasonably be expected to occur in the corpus Does not include terms with negative usage markers (do not use, deprecated, etc.) Does not include general lexicon words, due to application specificity for controlled authoring, reduced terminological “interest”, and high expected number of concordances © 2013 by Termologic. All rights reserved. 36
  • 37. Example of a filter for corpus-valid terms © 2013 by Termologic. All rights reserved. 37
  • 38. ● Termbase verbs occur 26,000 times ● Keyword verbs occur 90,000 times © 2013 by Termologic. All rights reserved. 38