2. What is Corpus linguistics?
Corpus linguistics is the study of language as
expressed in samples (corpora) or "real world"
text. This method represents a digestive
approach to deriving a set of abstract rules by
which a natural language is governed or else
relates to another language. Originally done
by hand, corpora are now largely derived by
an automated process.
3. One of the main contributions of corpus
linguistics is in the area of exploring patterns
of language use. Corpus linguistics provides an
extremely powerful tool for the analysis of
natural language an use varies in different
situations.
4. As a result of these advances there are typically
four features that are seen as characteristic of
corpus bases analyses of language:
o It’s empirical, analyzing the actual patterns of use
in natural texts.
o It utilizes large and principled collection of natural
texts, known as a ‘corpus’ the basis for analysis
o It makes extensive use of computers for analysis,
using both automatic and interactive techniques
o It depends on both quantitative and qualitative
analytical techniques
5. Corpus Design and Compilation
A corpus is a large and principled collection of
texts stored in electronic format. There is no
minimum size for a text collection to be
considered a corpus. This is a significant
development as it enables researchers all over
the world to access the same sets of data
which not only encourages a higher degree of
accountability in data analysis, nut also
permits collaborative word an follow up
studies by different researcher.
6. Types of Corpora
There are as many types f corpora as there are
research topics in linguistics. General corpora,
such as the Brown Corpus, the LOB, or the BNC,
aim to represent language I its broadest sense
and to serve as a widely available resource for
baseline or comparative studies of general
linguistic features.
A general corpus is designed to be balanced and
include language samples from a wide range of
registers or genres, including both fiction and
nonfiction in al their diversity.
7. Corpus Compilation
When creating a corpus, data collection involves
obtaining or creating electronic versions of the
target texts, and storing and organizing them.
Written corpora are far less labor intensive to
collect than spoken corpora.
The data collection phase of building a spoken
copus is lengthy and expensive. The first step
is to decide on a transcription system.
8. Word Counts and Basic Corpus Tools
There are many levels of information that can be
gathered from a corpus. These levels range
from simple word lists can reveal both
linguistic associating patterns.
The tools that are used for these analyses range
from basic concordance packages to complex
interactive computer programs.