2. Study of language as expressed
in samples (corpora) or "real
world" text.
DEFINITION
3. KUCERA AND W. NELSON FRANCIS
-publish Computational Analysis of Present-Day American
English (1967)
-contains a variety of computational analyses, combining
elements of linguistics, language
teaching, psychology, statistics, and sociology
RANDOLPH QUIRK
-publish Towards a description of English Usage' (1960) in
which he introduced The Survey of English Usage.
HISTORY
4. HOUGHTON-MIFFLIN
- publish American Heritage Dictionary (first
dictionary to be compiled using corpus linguistics)
-supply a million word, three-line citation base for the
dictionary
- AHD combines prescriptive elements with
descriptive information.
COLLINS
- publish COBUILD monolingual learner's dictionary
- designed for users learning English as a foreign
language, (compiled using the Bank of English)
-The Survey of English Usage Corpus was used in the
development of the Comprehensive Grammar of
English
5. MONTREAL FRENCH PROJECT
- The first computerized corpus of transcribed spoken
language
- contains one million words
ANDERSEN-FORBES
- is a computerized corpora
- database of the Hebrew Bible
- every clause is parsed using graphs representing
seven levels of syntax, and each segment are tagged
with seven fields of information
THE QURANIC ARABIC CORPUS
- an annotated corpus for the Classical Arabic
language of the Quran
- recent project with multiple layers of annotation
including morphological segmentation, part-of-
speech tagging, and syntactic analysis using
dependency grammar
7. Annotation consists of the application of a scheme to
texts.
Annotations may include structural mark-up, part-of-
speech tagging, parsing, and numerous other
representations.
1) ANNOTATION
8. Abstraction consists of the translation (mapping) of
terms in the scheme to terms in a theoretically
motivated model or dataset.
It typically includes linguist-directed search but may
include e.g., rule-learning for parsers.
2) ABSTRACTION
9. Analysis consists of statistically probing, manipulating
and generalising from the dataset.
Might include statistical evaluations, optimisation of
rule-bases or knowledge discovery methods.
3) ANALYSIS