The Use of Corpus Linguistics in Lexicography

The Use of Corpus Linguistics in
Lexicography
An Integrative Review

Lexicography
ENGL 6203

Submitted by:
IhsanIbadurrahman (G1025429)
SyareenIzzatyBtMajelan (G1029580)
RudianaRazali (G1115202)

The Use of Corpus Linguistics in Lexicography

An integrative literature review

I. Introduction

The practice of dictionary-making began as early as 1600 when Robert Cawdreyincluded words
that were deemed difficult as they were borrowed from another language into his version of the
dictionary (Siemens, 1994). The words from the dictionary were taken from Latin-English
dictionaries and also available texts of the time and were given concise definitions, synonym and
a fixed form (Siemens, 1994). It was Samuel Johnson who explicitly introduced the methods or
steps that weretaken to create his dictionary in the 1700s and some of the methods were then
followed by the committee entrusted to create “A New Dictionary” or currently known as the
Oxford English Dictionary in the 1800s.
A corpus is a collection of samples of authentic spoken and written text which are used
for analysis of words, meanings, grammar and usage (David, 1992). In Saussurian terminology,
the text is akin to that of parole, while the corpus provides the evidence of langue
(Tognini&Bonelli, 2001). The term corpus linguistics is used when a corpus is specifically used
to study a language. Lindquist (2009: 1) distinguishes the term with other branches of linguistics
such as sociolinguistics (the study of language and society), or psycholinguistics (the study of
language and the mind) in that corpus linguistics is a specific method used in language study, the
“how to” rather than the “what”. In other words, corpus linguistics is an approach rather than a
specific field of language study (Gries, 2009).

This paper aims to highlight major findings in the literature on corpus linguistics withan
added emphasis on its use in dictionary-making. In developing this integrative literature review,
18 sources were obtained:13 books, 2 journal articles, and 3 online articles. After all the
literature is reviewed, recurring ideas found in the literature are compared, listed, and discussed.
For ease of reading, the literature has been categorized into separate subheadings, namely, pre-
corpus era, the initial corpus, and the present corpus.

1

II. Literature Review

a. Pre-corpus linguistics
Robert Cawdrey'sTable Alphabeticall(1604) is considered to be the first monolingual English
dictionary ever made even though glosses of words have been made prior to Cawdrey's
dictionary (Jackson, 2002). Cawdrey's dictionary consisted of 2543 'hard' words which
comprised of loanwords that were considered difficult to be learned by the 'uneducated' reader
where the words were gathered from Latin-English dictionaries, glosses of religious, legal and
scientific texts (Siemens, 1994).Cawdrey provided a concise definition of each word, a synonym
or explanatory phrase and fixed form of many of the difficult words (Siemens, 1994; Jackson
2002). After the conception of Cawdrey's dictionary, a lot of effort have been made to better the
quality of the dictionary and the subsequent dictionaries were made according to the methods
employed by Cawdrey which was extracting 'hard' words from different texts and including them
into the dictionary.

It was in 1755 that Samuel Johnson published a two volume dictionary that he worked on
for 9 years (Jackson, 2002). It became the standard for English dictionary for 150 years before
the conception of the Oxford English Dictionary in England and was the first dictionary that used
quotes to indicate how each word was used (Baugh & Cable, 2002). Johnson in his letter to his
patron wrote that he had faced difficulties in adding a word into the dictionary in the following
order:

1) Selecting words. Johnson had to decide on which words that he wanted to include in the
dictionary and classify each word whether they are foreign or belong to English since a
lot of borrowing has been made from other languages. He also had to decide if words
from specific professions should be included in the dictionary.
2) Orthography. Johnson proposed that no change should be made to the spelling of words
without a sufficient reason because change would only cause inconvenience to others and
is a mark of weakness or inconsistency.
3) Pronunciation. Johnson says that along with orthography, pronunciation should also be
constant because stability in a language is important to the lifespan of a language and any
changes would create almost new speech which would corrupt spoken English of that
time.

2

4) Etymology and derivation. It is important to know the etymology of the word because it
is hard to discern which words are native to English with the amount of borrowings from
different languages.
5) Analogy. The rules that governed how the words are used are included.
6) Syntax. The construction of each word is shown because the construction of English is
too inconsistent that it would be difficult to be reduced to only rules.
7) Phraseology. The phrases in which the word is used are included to illustrate the
different ways the words can be used.
8) Interpretation. Compared to the previous steps, Johnson considers interpretation of a
word to be the most difficult part of creating the dictionary because he had to look at the
different usages of each word and come up with thebest explanation of the word.
9) Distribution. After all the above mentioned steps have been taken, Johnson then slotted
each word into their proper classes.
After more than 150 years being the main source of reference with several revisions,
Johnson‟s dictionary was found to be inadequate for the standards of modern scholarship
(Jackson, 2002). So in 1857 a committee was appointed to collect words that are not in the
dictionary to be added as a supplement but the committee found that it was not enough and in
1858 it was decided a new dictionary should be created (Baugh & Cable, 2002; Jackson, 2002).
The main aims of the new project were to record every word that can be found in English from
about the year 1000 and to exhibit the history of each from a selection of quotations from the
whole range of English writings (Baugh & Cable, 2002). They gathered a total of six million
slips containing quotations from volunteers not only from England but from all over the world as
well. After 24 years of hard work, they managed to publish the first instalment of the dictionary
that covers part of the letter A in 1884. Another 16 years passed when four and a half volume of
dictionary was published until the letter H. Finally in 1928, the final section of the dictionary was
issued making the effort to create "A New Dictionary" successful after 70 years and now known
as the Oxford English Dictionary (OED) (Baugh & Cable, 2002). The committee came up with
rules that have to be observed by the editors of OED before a word can be included in the
dictionary in the following order (Considine, 1996):

3

1) The Word to be explained.
2) The Pronunciation and Accent.
3) The Various Forms assumed by the word, and its principal grammatical inflexions.
4) The Etymon of the word.
5) The Cognate Forms in kindred languages.
6) The Meanings which are logically deduced from the Etymology, and arranged to show
the common thread or threads which unite them together.
Even though over a century has passed since Johnson created his dictionary, some of the
steps taken by Johnson were still used while creating the OED. This shows that the methods
employed by Johnson were still relevant to lexicographers and were the main steps to be taken in
making a dictionary before corpus linguistics was introduced in dictionary making.

b. The initial stage of corpus linguistics

In 1950s, there was a growing dissatisfaction of how language theory (e.g. Noam Chomsky‟s
syntactic structure) could not reason out the many „ungrammatical‟ patterns found in English
(i.e. distinction between transitive and intransitive verbs). There was a strong call for empirical,
real language data (Teubert, 2004). It was then that corpus was invented. The first corpus was
made out of a survey of English usage conducted by two universities, University of London and
the Brown University Corpus in Providence. In the 1960s,both compiled its million word corpus
of written text from 500 reading passages, which was named Brown Corpus. This American
corpus was a landmark in corpus linguistics since it was the first corpus to employ a computer in
its making. In 1982, the British version of the corpus, named the LOB corpus was compiledby
Hofland and Johansson. LOB is an abbreviation from The Lancaster-Oslo-and Bergen, and as its
name suggests it is a collaborative attempt between the three universities: the University of
Lancster, the University of Oslo, and the University of Norwegian Computing Centre of the
Humanities.

However, both the Brown corpus and LOB corpus were deemed to be inadequate to
sample English vocabulary. This gave birth to John Sinclair‟s English Lexical Studieswhich
specifically aimed to investigate vocabulary using an electronic text of spoken and written

4

language. The study gave prominence to collocation - words that naturally co-occur
together.Aimed to represent varieties of English where it is used as a first or second language,
Sidney Greenbaum compiled one-million-word corpora called The International Corpus of
English in 1988. The unique feature of this corpus is that it samples more spoken language
(60%) than its written counterpart (40%).

In the early 1990s, major universities and companies together compiled British National
Corpus (BNC) containing 100 million words from 1980 up to 1993. The compilers were Oxford
University Press, Longman, Chambers, the British Library, Oxford University and Lancaster
University. The aim of the corpus is to provide a balanced corpus that represents British English.
The corpus includes 10% spoken language and 90% written language, which comprises of 25%
fiction and 75% non-fiction. One big distinction between BNC and Brown is that the former took
samples from a longer piece of text between 40,000 and 50,000 words. This gives BNC an added
advantage of being representative since text contains a different use of words at the beginning, in
the middle, and at the end (Lindquist, 2009). Due to its sheer size, representativeness, and care,
most British publishers prefer to make use of this corpus as their source of lexicographic
information.

Typically, any corpora will need to go through a three-step process in its making. Before
going through these three steps, however the writer needs to determine the basic outlines of a
corpus such as the size of the corpus, the genre of the corpus, whether it will specifically look
into written, spoken language, or both. Sinclair (1996) points out that the principles underlying
corpus creation should be as large as possible including samples from a broad range of material
in order to accomplish one way of representativeness to be anticipated with the technology of the
time. The corpus should also be classified into different genres and even size. Once this basic
outlines is determined, the three-step process may begin. It starts with collecting the data, spoken
and/or written. It entails gathering a large mass of speech, written texts, obtaining permission,
and doing a careful and organized record-keeping. The next step is computerization which entails
converting raw spoken or written text into a digital format in a computer. Recording of speech
may be painstaking sinceit needs to be transcribed manually. Another concern with spoken text
is the issue of naturalness of the speech; it needs to be recorded in a natural, casual way that
resembles how people speak every day in real life, not in a stilted way. Though written records

5

seem to be less painstaking, it also has its problem, mainly the copyright issue. Still some texts
that come from books, magazines, and other written sources need to be retyped since scanning
device such as OCR (Optical character recognition) software that detect and scan words
automatically usually contain errors, so many that it‟s best to avoid using them altogether. The
last step is annotating, which involves assigning information such as parts of speech, etymology,
for each data. It should be noted that the three aforementioned steps need not to be seen as a
separate process; they are all closely connected. For example, after gathering recording of
speech, it may be best to transcribe it there and then.
Corpus may have given a lot of contributions in language study, but its impact to
lexicography did not start until 1989. Together with the advance of computer software, both have
since contributed significantly to the development of lexicography.Since everything is automated
and recorded in a digital format, lexicographers can now save their time and the tremendous
amount of work needed in compiling a dictionary. Typically, a dictionary usually has
information on the part of speech, usage, meaning, pronunciation, etymology of a word. Before
the advent of corpora, all this information had to be gathered manually; lexicographers needed to
do the hard labor of collecting slips of paper containing text that they intend to include in the
dictionary. For this reason, it took roughly 50 years to complete Oxford English Dictionary,
which was later known as New English Dictionary(Meyer, 2002). With corpora, dictionary
makers can now usea large sample of authentic spoken and written textas a source to illustrate
how each word in their list is used in real life. The citation used in dictionary comes from real-
life discourse. Real contexts also provide accurate, well-defined lexical meanings in the
definition of a word in dictionary, which is a huge improvement over the previous dictionary
practice where words were defined using an unscientific manner. One huge improvement in
dictionary making is the rich information available for words that have many invariant meanings
such as take, go, and time, whichtend to be overlooked in the previous dictionary practice
(Lindquist, 2009).

Another huge advantage of using corpora in lexicography is that information on word
frequency can also be obtained. This way, lexicographers can assign whether a word is among
the first 500 most common words, the next 500 and so on.Meyer (2002) notes that the most
frequent words are functional words such as the, an, a, and, and of which carry little lexical
meaning and the least frequent words are content words such as proper nouns. Gries (2009)

6

mentions two kinds of frequency information that lexicographers can obtain from a corpus:
frequencies of occurrence of linguistic elements in the so-called frequency list, and frequencies
ofco-occurrence of these linguistic elements in concordances. Lindquist (2009: 5) defines
concordance as “a list of all the contexts in which a word occurs in a particular text”. Using a
Key Word in Context (KWIC) concordance, words can be retrieved within theirsurrounding text,
and be presentedvertically on the screen. Since the information is presented in contexts,
lexicographers can easily assign the collocations of each word in their dictionary. Below is an
excerpt from concordance software in which the word “corpus” is highlighted.

Figure 1: Concordance from a software called AntConc 3.2.2w (Gries, 2009).

The above figure illustrates concordance software called AntConct in use. It should be
noted that the software does not come with a ready-made corpus. Hence, users need to readily
have a file to generate a KWIC output. The latest version of the software is 3.2.4w and can be
downloaded online at http://www.antlab.sci.waseda.ac.jp/software.html. Similar software that
lexicographers may use to find how words are used in context is wordsmith tools, devised by
Mike Scott in 1993. Since then the software has gone through a lot of changes which now
include a concordance, word-listing, web text downloader and many other features (Wikipedia,
2011). Previous versions of the software were sold and owned by Oxford University Press. The
software‟s current version is now owned by Lexical Analysis Software Ltd. The current

7

Wordsmithversion is 5.0, and can be downloaded online at:
http://www.lexically.net/wordsmith/version5/index.html. However, unlike AntConc, Wordsmith
is a shareware. In order to unlock the demo version from the website, user will need to pay a
single-user license of £50 or around $70-80 from two online retailers (Lexical Software
Analysis, and Oxford University Press).

Since corpus is discourse-based, it means that the word appears inhaphazard, arbitrary
collection of occurrences, as illustrated in the figure above. Dictionary makers need to check for
some contradictions with „real‟ meaning. It is thus dangerous to solely depend on corpus
(Teubert, 2004).One way to check the word in context is to expand the text by retrieving its
original source. Such feature is lacking in both software mentioned previously: the AntConc and
Wordsmith tools. Fortunately, the feature is thankfully available for free from Birmingham
Young University Website, which provides a concordance containing BNC, COCA (Corpus of
Contemporary American English), and some other corpora and can be accessed at:
http://corpus.byu.edu/

The huge amount of data in the corpus also allows lexicographers to look for new words
that occur for the first time in spoken or written text. However, the corpus has to be large
enough to glean information on vocabulary items (Meyer, 2002). A small corpus such as LOB
corpus which stores roughly one million word items could not give lexicographers enough
information on the range of vocabulary items. A monitor corpus is also needed, in which large
data of language is pooled from time to time, rather than fixed only in one particular time period.
This way, the corpus is frequently updated with new words and meanings in today‟s growing
language.

The first dictionary to be founded wholly on corpus is Collins COBUILD series of
English Language Dictionary compiled in 1987, guided by John Sinclair. The dictionary has its
citation taken from real life discourse, and each word is defined from these authentic texts,
instead of relying on previous dictionary. This entails using a very large corpus so that it may be
able to include all lemmas including their word senses. However, this presents problem in that
there tends to be an exclusion of rare words such as apothegm(Teubert, 2004). Besides being the
first corpus-based dictionary, COBUILD is innovative in that the definitions are akin to a

8

classroom teacher explaining the words. For example in describing the word junk, it says: “You
can use junk to refer to old and second-hand goods that people buy and collect” (Jackson, 2002).

In the practice of dictionary-making, one crucial distinction has to be made between
corpus-based dictionary and corpus-driven dictionary. Dictionaries such as Collins COBUILD
series of English Language dictionaries are said to be corpus-driven if the corpus itself is used to
validate information presented in the dictionary. However, if the corpus is used to extract the
information used in the dictionary, it is called corpus-driven. Teubert (2004: 112) suggests that
dictionary should follow corpus-driven approach so that it may complement standard linguistics
and not just extend it.

c. Modern corpus linguistics

During the 1970s, computational research on English had not developed much in
Birmingham because heavy preparation was spent towards devising software packages,
instituting undergraduate courses and influencing opinions on the campus (Sinclair, 1991). On
that time, when computing was almost restricted to a number of crises, there was a highlight for
the importance of data- processing. It has taken approximately fifty years to make a real
improvement in the area of corpus- based linguistics which has been driven by systems that work
and methodologies that can produce reasonable coverage of linguistic condition (Lawler & Dry,
1998). Years after years, there has been a realization of emergence on accessibility of
computational resources such as fast machines and sufficient storage in order to process large
volumes of data. Besides that, in the modern corpus, there is a growing availability of corpora
with linguistics annotations, for example, part of speech, prosodic intonation, proper names, and
bilingual parallel corpora. Furthermore, the maturity of computational linguistics technology has
improved the commercial market for natural language product and the corpus linguistics
nowadays has been equipped by efficient parsing and statistical techniques.
From 1980 to 1986, computational language was put to good effect which transformed
into a completely new set of techniques for language observation, analysis, and recording. This is
as well bringing to the development of editing substantial dictionaries by using technique and
huge database of annotated examples.

9

One of the most prominent uses of a corpus in recent years is as a resource for
lexicography. There was a corpus-based work for a small number of languages that was used in
lexicography. Only recently the need for very large corpora has come to the front. The
Lexicography and Natural Language Processing (NLP) collaboration has incited the use of
corpora in dictionary projects that have had access to very large corpora (Hua, 2001).
The role of the computer has a clerical role in lexicography which reducing the labor of
sorting and filing and examining very large amounts of English in a short time (Sinclair, 1991).
In the late 1970s, the prospects of computerized typesetting were growing more realistic. Ten
years later, in the early 1980s, a multi-million word corpus became available for study but still
limited. From simple tools, it has evolved to a substantial progress together with crucial,
profound and basic linguistic generalizations (Lawler & Dry, 1998). By these kinds of developed
tools, they have revealed many topics for inquiry which have not been well explored by
traditional linguistic methods.
In the modern era, the word has been reserved for collections of texts that are stored and
accessed electronically. Electronic corpora are usually larger than the paper-based collections
which are basically small, previously used to study the aspect of language (Hunston, 2002).This
is due to the capacity of computers that can store and process large amount of information
compared to the previous time.
One of the work in the area of corpus linguistics is from the work done by Johansson and
collegues in producing a parallel corpus of British English have made it possible for research
workers to scrutinize and visualize physically texts of greater length compared to the time
before. The main structural features of these corpora are:
- A classification into genres (15) of printed texts
- A large number (500) of fairly short extracts (2000 words), giving a total of around
one million words.
- A close to random selection of extracts within genres.

Due to this, a great amount of useful information can be extracted easily from the
corpora. Besides that, many locations have samples of text which provide hundreds of billions of
words. Many collections available such as Association for Computational Linguistics‟ Data
Collection Initiative (ACL/DCI), the European Corpus Initiative (ECI), ICAME, The British

10

National Corpus (BNC), the Linguistic Data Consortium (LDC), the Consortium for Lexical
Research (CLR), Electronic Dictionary Research (EDR), and standardization efforts such as the
Text Encoding Initiative (TEI) (Armstrong, 1994).
The application of corpora in applied linguistics is also extended to the language teaching
apart from the area of lexicography. It has benefited into a wide variety of field. Other relevant
applications of corpora are to the production of dictionaries and grammars, in critical linguistics,
translation, literary studies and stylistic, forensic linguistics and designing writer support
packages (Hunston, 2002).
In relation towards the dictionary making, corpora have a contribution towards the area
which is most far-reaching and influential. The use of corpora has changed dictionaries in a way
that it has stressed on frequency, collocation and phraseology, variation, lexis in grammar and
authenticity (Hunston, 2002). Recent innovations of dictionaries include the on-line Longman
Web Dictionary and the Collins COBUILD English Collocations on CD ROM.
Sinclair (1996) points out that the principles underlying corpus creation should be as
large as possible including samples from a broad range of material in order to accomplish one
way of representativeness to be anticipated with the technology of the time. The corpus should
also be classified into different genres and even size.

d. The use of corpora in language teaching

The method of using corpora in the disciplines of many studies is not uncommon (McEnery&

Wilson, 1996:4). Apart from Lexicography, other possible areas include Language Teaching,

Discourse and Pragmatics, Semantics, Sociolinguistics, Historical linguistics and Stylistic.

Within the area of Language teaching, we also have another branch known as CALL (Computer-

Assisted Language Learning), where it provides a further application of corpora. There is a study

conducted at Lancaster University towards the role of corpus-based computer software for

teaching undergraduates the basis concept of grammatical analysis (Hua, 2001). The software is

called Cytor which reads an annotated corpus, including part-of-speech tagged or parsed, in one

11

sentence at a time. Besides the reading, it also hides the annotation and asks the students to

annotate the sentences on their own. In addition, students could call up help in the form of the list

of tag mnemonics, examples of frequency lexicon or concordances.

How effective is the Cytor at teaching part-of-speech learning? A research carried out

related to this was done by McEnery, Baker and Wilson (1995, cited in Hua, 2001) which after

comparing two groups of students which have different treatments; one who were taught with

Cytor and another via traditional lecturer-based methods, the result suggests that the computer-

taught students performed better than the human-taught students throughout the term.

Another use of corpus in the language teaching and learning is the adaptation of

classroom concordance (data driven learning) by classroom practitioner where corpus has

become a source for empirical teaching data (Hua:2001,5). One of the examples of link to Data-

Driven Learning is Tim John‟s Home Page at http://web.bham.ac.uk/johnstf/. It provides an

outstanding resource of online web-based bibliographic database of books and articles related to

Corpora and Language Teaching. Moreover, it has included online worksheets which involving

corpora for classroom teaching. Another resource which is also quite interesting is the “Grammar

Safari” site developed at Champaign-Urbana and can be found online at

http://deil.lang.uiuc.edu/web.pages/grammarsafari.html which provides careful and thoughtful

selection of corpus-based activities. Furthermore, the Longman Grammar of Spoken and Written

English by Douglas Biber et al to answer student questions related to grammar contribute to the

useful corpus categorized into fiction, conversation, news, etc.

12

III. Discussions and Conclusions:

From the reviewed literature, it could be dictionary has been around centuries ago. The first
dictionary was made in the 1600s and was based on what was considered difficult words at that
time. During this initial stage, lexicographers faced some challenges in adding words into their
dictionaries: selecting words, orthography, pronunciation, etymology and derivation,
analogy,syntax, phraseology, interpretation, distribution.All this information had to be gathered
manually; lexicographers needed to do the hard labor of collecting slips of paper containing text
that they intend to include in the dictionary. For this reason, it took roughly 50 years to complete
Oxford English Dictionary, which was later known as New English Dictionary. However with
the advent of corpus linguistics, things began to change dramatically.
In 1989, together with the technological advance in computer, corpus provided a
significant contribution to the development of dictionary making. Corpus linguistics made such a
huge impact in dictionary-making:

a. It significantly reduces the time and the heavy work it needs to compile a
dictionary since everything is automated and computerized.
b. Each dictionary now resembles how language is used in real world. Meaning is
assigned from these samples, rather than from the writer‟s point of view.
c. Frequency of each word in the list can be assigned / identified.
d. Much more information can be given to words with a lot of variant meanings such
as go, and take.
e. It makes it easy to include collocation because words appear in its surrounding
text.
f. It can quickly take „new‟ everyday words into the system.

However, because corpus is discourse-based, it means that the word appears inhaphazard,
arbitrary collection of occurrences. Dictionary makers need to check for some contradictions
with „real‟ meaning. It is thus dangerous to solely depend on corpus. Another disadvantage of
dictionaries that are corpora-based is that it tends to exclude rare words (not appearing in real
world language) such as apothegm.The first dictionary to ever make it corpus-based is Collins
COBUILD series of English dictionaries.

13

Corpus linguistics serve some linguistic purpose and to preserve the texts due to the
intrinsic value in the texts (Hunston, 2002). It also can be used as groundwork for research. The
storage of a corpus allows the users to study it non-linearly and both quantitatively and
qualitatively. The nature of a corpus does not include new information about language but to
offer us a new viewpoint on the given information. It shows us a way that language can be
examined. Most of available software packages process data from a corpus in three ways;
showing frequency, phraseology, and collocation (Hunston , 2002).
Corpora have made life simpler as well as more complex. In situations that corpora have
made the life of users simpler are, for example, when a translator could see quickly the
comparison of words that are more or less equivalent or a teacher could refer to the corpus when
he or she wishes to show the reasons of why a particular usage is incorrect or inexact in
explanations. On the other hand corpora could also made life more complex in a sense that
language is patterned in a much more fined way than what we might have been expected that a
simple and general rule turns out to be applied only in certain context (Hunston, 2002).
The modern corpusis reserved for collections of texts that are stored and accessed
electronically. Electronic corpora are usually larger than the paper-based collection which is
basically small, previously used to study the aspect of language. Electronic corpora gave birth to
the recent innovations of dictionaries, which include the on-line Longman Web Dictionary and
the Collins COBUILD English Collocations on CD ROM.

14

References:

Armstrong, S. (1994). Using Large Corpora. Cambridge: MIT Press.

Baugh, A. C. & Cable, T. (2002).A History of the English Language.Oxon: Routledge.

Considine, J. (1996). The Meanings, deduced logically from etymology in Gellerstam, M.;
JekerJäborg; Sven-GöranMalmgren; Kerstin Norén; Lena Rogström y
CatarinaRöjderPammehl (eds.), Euralex ‘96 Proceedings. Papers submitted to the Seventh
EURALEX International Congress on Lexicography in Göteborg, Sweden,Göteborg
University - Department of Swedish, Göteborg, 1996, 365-371.
David, C. (1992). An Encyclopedic Dictionary of Language and Languages. Oxford: Oxford
University Press. Retrieved from:
http://www.tuchemintz.de/phil/english/chairs/linguist/independent/kursmaterialien/language_
computers/whatis.htm

Gries, S.T. (2009). „What is Corpus Linguistics?‟,Language and Linguistics Compass, Vol. 3.
pp.1-14

Hua,T.K. (2001). Corpora: Characteristics and Related Studies. Kuala Lumpur: MazizaSdn
Bhd.

Hunston , S. (2002). Corpora in Applied Linguistics. UK : Cambridge University Press.

Jackson, H. (2002). Lexicography, an Introduction. Oxon: Routledge.

Johnson, S. (1747). The Plan of a Dictionary of the English Language.

Lawler, J.M. &Dry,H.A. (1998). Using Computers in Linguistics: A Practical Guide. London:
Routledge.

Lindquist, H. (2009). Corpus Linguistics and the Description of English. Edinburgh: Edinburgh
University Press.

Mason, O. (2000).Programming for Corpus Linguistics:How to Do Text Analysis with Java.
Edinburgh: Edinburgh University Press.

Meyer, C.F. (2002). English Corpus Linguistics.Cambridge: Cambridge University Press.

McEnery T. & Wilson, A. (1996).Corpus Linguistics. Edinburgh: Edinburgh University Press.

Siemens, R. G. (1994). Robert Cawdrey: A Table Alphabetical of Hard Usual English Words
(1604). Retrieved from http://www.library.utoronto.ca/utel/ret/cawdrey/cawdrey0.html

Sinclair,J. (1991). Corpus,Concordance,Collocation. Oxford: Oxford University Press.

15

Teubert, W. (2004).„Language and corpus linguistics‟.Lexicology and Corpus
Linguistics.London: Continuum.

Tognini, E., Bonelli. (2001). Corpus Linguistics at Work.Amsterdam: John Benjamins
Publishing Co.

WordSmith. (2011, October 15). In Wikipedia, The Free Encyclopedia. Retrieved April 22,
2012, from http://en.wikipedia.org/w/index.php?title=WordSmith&oldid=455732307

16

The Use of Corpus Linguistics in Lexicography

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a The Use of Corpus Linguistics in Lexicography

Similar a The Use of Corpus Linguistics in Lexicography (20)

Más de Ihsan Ibadurrahman

Más de Ihsan Ibadurrahman (20)

The Use of Corpus Linguistics in Lexicography