SlideShare una empresa de Scribd logo
1 de 8
Corpus Linguistics
What is Corpus linguistics?
Corpus linguistics is the study of language as
  expressed in samples (corpora) or "real world"
  text. This method represents a digestive
  approach to deriving a set of abstract rules by
  which a natural language is governed or else
  relates to another language. Originally done
  by hand, corpora are now largely derived by
  an automated process.
One of the main contributions of corpus
 linguistics is in the area of exploring patterns
 of language use. Corpus linguistics provides an
 extremely powerful tool for the analysis of
 natural language an use varies in different
 situations.
As a result of these advances there are typically
  four features that are seen as characteristic of
  corpus bases analyses of language:
o It’s empirical, analyzing the actual patterns of use
  in natural texts.
o It utilizes large and principled collection of natural
  texts, known as a ‘corpus’ the basis for analysis
o It makes extensive use of computers for analysis,
  using both automatic and interactive techniques
o It depends on both quantitative and qualitative
  analytical techniques
Corpus Design and Compilation
A corpus is a large and principled collection of
  texts stored in electronic format. There is no
  minimum size for a text collection to be
  considered a corpus. This is a significant
  development as it enables researchers all over
  the world to access the same sets of data
  which not only encourages a higher degree of
  accountability in data analysis, nut also
  permits collaborative word an follow up
  studies by different researcher.
Types of Corpora
There are as many types f corpora as there are
  research topics in linguistics. General corpora,
  such as the Brown Corpus, the LOB, or the BNC,
  aim to represent language I its broadest sense
  and to serve as a widely available resource for
  baseline or comparative studies of general
  linguistic features.
A general corpus is designed to be balanced and
  include language samples from a wide range of
  registers or genres, including both fiction and
  nonfiction in al their diversity.
Corpus Compilation
When creating a corpus, data collection involves
  obtaining or creating electronic versions of the
  target texts, and storing and organizing them.
  Written corpora are far less labor intensive to
  collect than spoken corpora.
The data collection phase of building a spoken
  copus is lengthy and expensive. The first step
  is to decide on a transcription system.
Word Counts and Basic Corpus Tools
There are many levels of information that can be
  gathered from a corpus. These levels range
  from simple word lists can reveal both
  linguistic associating patterns.
The tools that are used for these analyses range
  from basic concordance packages to complex
  interactive computer programs.

Más contenido relacionado

La actualidad más candente

Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
nfuadah123
 
Pidgins creoles - sociolinguistics
Pidgins   creoles - sociolinguistics Pidgins   creoles - sociolinguistics
Pidgins creoles - sociolinguistics
Amal Mustafa
 

La actualidad más candente (20)

Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Generative grammar ppt report
Generative grammar ppt reportGenerative grammar ppt report
Generative grammar ppt report
 
Sociolinguistics
SociolinguisticsSociolinguistics
Sociolinguistics
 
Introduction to corpus linguistics 1
Introduction to corpus linguistics 1Introduction to corpus linguistics 1
Introduction to corpus linguistics 1
 
Hstorical linguistics
Hstorical linguisticsHstorical linguistics
Hstorical linguistics
 
Ayesha prrsntaton on folk linguistic beliefs
Ayesha prrsntaton on folk linguistic beliefsAyesha prrsntaton on folk linguistic beliefs
Ayesha prrsntaton on folk linguistic beliefs
 
Lingua franca
Lingua francaLingua franca
Lingua franca
 
Translation studies
Translation studiesTranslation studies
Translation studies
 
English as an islamic language ahmar mehboob
English as an islamic language ahmar mehboobEnglish as an islamic language ahmar mehboob
English as an islamic language ahmar mehboob
 
The Prague School.ppt
The Prague School.pptThe Prague School.ppt
The Prague School.ppt
 
Corpus and bnc
Corpus and bncCorpus and bnc
Corpus and bnc
 
Over View of the 19th century History of linguistics
Over View of the 19th century  History of linguisticsOver View of the 19th century  History of linguistics
Over View of the 19th century History of linguistics
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
SOCIOLINGUISTICS:Language Maintenance, Shift and Death
SOCIOLINGUISTICS:Language Maintenance, Shift and DeathSOCIOLINGUISTICS:Language Maintenance, Shift and Death
SOCIOLINGUISTICS:Language Maintenance, Shift and Death
 
Introduction to Linguistics_The History of Linguistics
Introduction to Linguistics_The History of LinguisticsIntroduction to Linguistics_The History of Linguistics
Introduction to Linguistics_The History of Linguistics
 
Multilingualism
Multilingualism Multilingualism
Multilingualism
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
 
Diglossia
DiglossiaDiglossia
Diglossia
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Pidgins creoles - sociolinguistics
Pidgins   creoles - sociolinguistics Pidgins   creoles - sociolinguistics
Pidgins creoles - sociolinguistics
 

Similar a Corpus linguistics

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Raul Vargas
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
ijnlc
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
kevig
 
lexicography
lexicographylexicography
lexicography
ayfa
 
Syracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docxSyracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docx
deanmtaylor1545
 

Similar a Corpus linguistics (20)

Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:
 
Treebank annotation
Treebank annotationTreebank annotation
Treebank annotation
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
LSDI.pptx
LSDI.pptxLSDI.pptx
LSDI.pptx
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
 
lexicography
lexicographylexicography
lexicography
 
Corpus Linguistics II.pptx
Corpus Linguistics II.pptxCorpus Linguistics II.pptx
Corpus Linguistics II.pptx
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Syracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docxSyracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docx
 
lexicographic evidence
lexicographic evidencelexicographic evidence
lexicographic evidence
 

Más de Alicia Ruiz

Focus on the language learner
Focus on the language learnerFocus on the language learner
Focus on the language learner
Alicia Ruiz
 
Sociolinguistics
SociolinguisticsSociolinguistics
Sociolinguistics
Alicia Ruiz
 
Psycholinguistics
PsycholinguisticsPsycholinguistics
Psycholinguistics
Alicia Ruiz
 
Second language acquisition
Second language acquisitionSecond language acquisition
Second language acquisition
Alicia Ruiz
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
Alicia Ruiz
 
An overview of applied linguistics
An overview of applied linguisticsAn overview of applied linguistics
An overview of applied linguistics
Alicia Ruiz
 

Más de Alicia Ruiz (10)

Everyday tasks
Everyday tasksEveryday tasks
Everyday tasks
 
Focus on the language learner
Focus on the language learnerFocus on the language learner
Focus on the language learner
 
Sociolinguistics
SociolinguisticsSociolinguistics
Sociolinguistics
 
Psycholinguistics
PsycholinguisticsPsycholinguistics
Psycholinguistics
 
Second language acquisition
Second language acquisitionSecond language acquisition
Second language acquisition
 
Pragmatics
PragmaticsPragmatics
Pragmatics
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Vocabulary
VocabularyVocabulary
Vocabulary
 
Grammar
GrammarGrammar
Grammar
 
An overview of applied linguistics
An overview of applied linguisticsAn overview of applied linguistics
An overview of applied linguistics
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Corpus linguistics

  • 2. What is Corpus linguistics? Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process.
  • 3. One of the main contributions of corpus linguistics is in the area of exploring patterns of language use. Corpus linguistics provides an extremely powerful tool for the analysis of natural language an use varies in different situations.
  • 4. As a result of these advances there are typically four features that are seen as characteristic of corpus bases analyses of language: o It’s empirical, analyzing the actual patterns of use in natural texts. o It utilizes large and principled collection of natural texts, known as a ‘corpus’ the basis for analysis o It makes extensive use of computers for analysis, using both automatic and interactive techniques o It depends on both quantitative and qualitative analytical techniques
  • 5. Corpus Design and Compilation A corpus is a large and principled collection of texts stored in electronic format. There is no minimum size for a text collection to be considered a corpus. This is a significant development as it enables researchers all over the world to access the same sets of data which not only encourages a higher degree of accountability in data analysis, nut also permits collaborative word an follow up studies by different researcher.
  • 6. Types of Corpora There are as many types f corpora as there are research topics in linguistics. General corpora, such as the Brown Corpus, the LOB, or the BNC, aim to represent language I its broadest sense and to serve as a widely available resource for baseline or comparative studies of general linguistic features. A general corpus is designed to be balanced and include language samples from a wide range of registers or genres, including both fiction and nonfiction in al their diversity.
  • 7. Corpus Compilation When creating a corpus, data collection involves obtaining or creating electronic versions of the target texts, and storing and organizing them. Written corpora are far less labor intensive to collect than spoken corpora. The data collection phase of building a spoken copus is lengthy and expensive. The first step is to decide on a transcription system.
  • 8. Word Counts and Basic Corpus Tools There are many levels of information that can be gathered from a corpus. These levels range from simple word lists can reveal both linguistic associating patterns. The tools that are used for these analyses range from basic concordance packages to complex interactive computer programs.