SlideShare una empresa de Scribd logo
1 de 108
Descargar para leer sin conexión
Language Variation Suite Toolkit
Olga Scrivner, Rafael Orozco, Manuel Díaz-Campos
NWAV47, October 18, 2018
Indiana University and Louisiana State University
Workshop Information
What You Will Learn
Part 1 Cross-disciplinary Methods in Sociolinguistics
Part 2 Web Interface - a novel approach for interactive
socio-analysis
Part 3 Working with Structured Data: LVS
Part 4 Working with Unstructured Data: ITMS
2
Our Materials - Web Site
https://languagevariationsuite.
wordpress.com/
3
What We Need
- Computer
- WiFi
- Smartphone (optional)
- R and Rstudio (not really)
4
What We Need
- Computer
- WiFi
- Smartphone (optional)
- R and Rstudio (not really)
4
Cross-disciplinary Methods
Our Objectives
Goal: Optimize sociolinguistic analysis by using a variety of
cross-disciplinary methods
1. Introduce a novel (socio-)linguistic toolkit for research and
as a teaching tool
2. Develop practical skills via interactive and visual interface
3. Understand and interpret advanced statistical and data
mining models
6
Cross-disciplinary Research
7
Corpus
Linguistics
Computational
Linguistics
Sociolinguistics
Data
Science
Statistics
Cross-disciplinary Research: Methods
8
Corpus
Linguistics
Computational
Linguistics
Sociolinguistics
Data
Science
Statistics
Cleaning
Pre-processing
Creating
Corpora
Parametric
Non-parametric
Models
Clustering
Topic Modeling
Cross-disciplinary Methods: A Closer Look
Random Forest - a more stable than stepwise variable selection
(Tagliamonte and Baayen,2012, 25)
Conditional Tree - a hierarchical tree of significant factors and
their relationship with other predictors
9
Cross-disciplinary Methods: A Closer Look
Clustering - a grouping of variables or individuals or
documents (Gries and Hilpert, 2008, 69)
Topic Modeling - a discovery of language patterns and word usage
over time and across genres (Blei, 2012,
78)
10
How To Unify All Methods: Data Science Workflow
1. Import data into R: read_csv(), read_line() [even praat
files!]
2. Tidy data - pre-processing
3. Transform with dplyr [select, subset, recode...]
4. Visualize with ggplot and plotly
5. Model with glm, lda, randomForest...
11
Tidyverse - a system of R packages for data manipulation,
exploration, and visualization that share a common design
philosophy (Wickham and Grolemund, 2017)
Sociolinguistic Data - Data Analytics
“Analytics is the critical technology
needed to bring value out of data”
(Anonymous)
12
Sociolinguistic Data - Visual Analytics
“The science of analytical reasoning
facilitated by visual interactive interfaces”
(Thomas and Cook, 2005)
“Visual analytics integrates new computational and
theory-based tools with innovative interactive techniques
and visual representations to enable human-information
discourse” (Thomas and Cook, 2005)
PositionSentence
p < 0.001
1
ind, pre post
Heaviness
p = 0.003
2
≤ 1 > 1
Period
p < 0.001
3
≤ 1 > 1
Node 4 (n = 81)
VOOV
0
0.2
0.4
0.6
0.8
1
Node 5 (n = 119)
VOOV
0
0.2
0.4
0.6
0.8
1
Node 6 (n = 181)
VOOV
0
0.2
0.4
0.6
0.8
1
Period
p < 0.001
7
≤ 2 > 2
Node 8 (n = 221)
VOOV
0
0.2
0.4
0.6
0.8
1
Focus
p < 0.001
9
cf nf
Node 10 (n = 66)
VOOV
0
0.2
0.4
0.6
0.8
1
Main_Verb_Structure
p < 0.001
11
ACIOther, Restructuring
Node 12 (n = 43)
VOOV
0
0.2
0.4
0.6
0.8
1
Node 13 (n = 265)
VOOV
0
0.2
0.4
0.6
0.8
1
13
Introduction to LVS
What is LVS?
Language Variation Suite
It is a Shiny web application designed for data analysis in
sociolinguistic research.
It can be used for:
Processing spreadsheet data
Reporting in tables and graphs
Analyzing means, regression, conditional trees ...
(and much more)
15
Background
LVS is built in R using Shiny package:
1. R - a free programming language for statistical computing
and graphics
2. Shiny App - a web application framework for R
Computational power of R + Web interactivity
16
Background
http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/
17
Workspace
Browser
Chrome, Firefox, Safari - recommendable
Explorer may cause instability issues
Accessibility
PC, Mac, Linux
◦ Data files will be uploaded from any location on your
computer
Smart Phone
◦ Data files must be on a cloud platform connected to your
phone account (e.g. dropbox)
18
Server
Since LVS is hosted on a server, Shiny idle time-out settings may
stop application when it is left inactive (it will grey out).
Solution: Click reload and re-upload your csv file
19
Data Preparation
Data Preparation
Important things to consider before data entry:
File format:
◦ Comma separated value (CSV) - faster processing
◦ Excel format will slow processing
Column names should not contain spaces
◦ Permitted: non-accented characters, numbers, underscore,
hyphen, and period
One column must contain your dependent variable
The rest of the columns contain independent variables
21
Terminology Review
a. Categorical/Nominal - non-numerical data with two
values
◦ yes - no; deletion - retention; perfective - imperfective
b. Continuous - numerical data
◦ duration, age, chronological period
c. Multinomial - non-numerical data with three or more
values
◦ deletion - aspiration - retention
d. Ordinal - scale
22
Test Types
Dependent Independent Test
Continuous Continous t-test
Categorical Categorical Chi-square
Continuous Multiple Categorical/ Linear Regression
Continuous
Categorical Multiple Categorical/ Logistic Regression
Continuous Binary or Multinomial
(Dickinson, 2014)
23
24
Questions?
Comments?
24
Questions?
Comments?
Workshop Files
https://languagevariationsuite.wordpress.com/
1. categoricaldata.csv: categorical dependent - Labov New
York 1966 study
2. continuousdata.csv: continuous dependent - Intervocalic
/d/ in Caracas corpus (Díaz-Campos and Gradoville,
2011; Scrivner and Díaz-Campos, 2016)
3. LVS web site:
www.languagevariationsuite.com
25
LVS Structure
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential Statistics
◦ Modeling, regression, conditional trees, random forest
27
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential Statistics
◦ Modeling, regression, conditional trees, random forest
27
Upload File
Upload movie_metadata.csv
28
Excel Format
1. Slow processing
2. Requires the name of your excel sheet
29
TIPS: Save Your Excel as CSV Format
To optimize speed - Save as CSV prior upload
30
Upload File
Upload categoricaldata.csv
31
Uploaded Dataset
The data content is imported as a table and allows for sorting
columns.
32
Summary
Summary provides a quantitative summary for each variable,
e.g. frequency count, mean, median.
33
Data Structure
1. Total number of observations (rows)
2. Number of variables (columns)
3. Variable types
◦ Factor - categorical values
◦ Num - numeric values (0.95, 1.05)
◦ Int - integer values (1, 2, 3)
34
Cross Tabulation
Cross-tabulation examines the relationship between variables.
35
Variables
36
Cross-Tabulation Output
Raw frequency / Proportion by column / Proportion across
row
37
38
Questions?
Comments?
Visual Analysis
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential statistics
◦ Modeling, regression, varbrul analysis, conditional trees,
random forest
40
Adjusting Browser - Layout
Shiny pages are fluid and reactive.
41
One Variable Plot
42
Two Variables Plot
43
Saving Plot
44
Three Variables Plot
45
Cluster Plot
Classification of data into sub-groups is based on pairwise
similarities
Groups are clustered in the form of a tree-like dendrogram
46
Cluster Plot
47
Cluster Plot
Saks (upper middle-class store), Macy’s (middle-class store), Kleins
(working-class)
48
49
Questions?
Comments?
Inferential Analysis
Inferential Statistics
51
Language Variation Suite - Structure
1. Data
◦ Upload file, data summary, adjust data, cross tabulation
2. Visual Analysis
◦ Plotting, cluster classification
3. Inferential statistics
◦ Modeling, regression, varbrul analysis, conditional trees,
random forest
52
How to Create a Regression Model
Step 1 Modeling - create a model with dependent and
independent variables
Step 2 Regression - specify the type of regression (fixed,
mixed) and type of dependent variable (binary,
continuous, multinomial)
Step 3 Stepwise Regression - compare models
(Log-likelihood, AIC, BIC)
Step 4 Conditional Trees - apply non-parametric tests to
the model
53
Modeling
54
Modeling
54
We are interested in RETENTION
= Application
Regression Types
Model
a.) Fixed effect
b.) Mixed effect - individual speaker/token variation (within
group)
Type of Dependent Variable
a.) Binary/categorical (only two values)
b.) Continuous (numeric)
c.) Multinomial - categorical with more than two values
55
Regression
56
Model Output
57
Model Output
57
Interpretation
Deletion is the reference value
Positive coefficient - positive effect
Negative coefficient - negative effect
58
Interpretation - RETENTION
Lexical item Fourth has a negative effect on retention compared
to Floor and is significant
Normal style has a slightly negative effect on retention but its
coefficient is not significant
Macy’s and Saks have a positive and significant effect on
retention. Saks (upper middle class store) is more significant
than Macy’s (middle class store)
http://www.free-online-calculator-use.com/scientific-notation-converter.html
59
Interpretation - RETENTION
Lexical item Fourth has a negative effect on retention compared
to Floor and is significant
Normal style has a slightly negative effect on retention but its
coefficient is not significant
Macy’s and Saks have a positive and significant effect on
retention. Saks (upper middle class store) is more significant
than Macy’s (middle class store)
http://www.free-online-calculator-use.com/scientific-notation-converter.html
59
exponential notation:
1.48e-8
0.0000000148
60
Questions?
Comments?
Conditional Tree
Conditional tree: a simple non-parametric regression analysis,
commonly used in social and psychological studies
Linear regression: all information is combined linearly
Conditional tree regression: visual splitting to capture
interaction between variables
Recursive splitting (tree branches)
61
Conditional Tree - Tagliamonte and Baayen (2012)
1. The distribution of was/were is split in two groups by
individuals.
2. The variant were occurs significantly more frequently with the
first group.
62
Conditional Tree - Tagliamonte and Baayen (2012)
1. Polarity is relevant to the second group of individuals.
2. The variant were occurs significantly more often with negative
polarity
63
Conditional Tree - Tagliamonte and Baayen (2012)
1. Affirmative Polarity is conditioned by Age.
2. The variant was is produced significantly more often by
Individuals of 46 and younger.
64
Conditional Tree
65
Conditional Tree
1. Store is the most significant factor for R-use
◦ Kleins (working class store) - more R-deletion
2. R-use in Macy’s and Saks is conditioned by lexical item:
◦ Floor shows more R-retention than Fourth
3. Style is not significant
66
Model Output
67
Random Forest
1. Variable importance for predictors
2. Robust technique with small n large p data
3. All predictors considered jointly (allows for inclusion of
correlated factors)
68
Random Forest
69
Random Forest
1. Store is the most important predictor
2. Lexical Item is the second predictor
3. Style is irrelevant: close to zero and red dotted line (cut-off
value).
70
Mixed Effects
Fixed and Mixed Models
Fixed Effects Model : All predictors are treated independent.
Underlying assumption - no group-internal
variation between speakers or tokens
Mixed Effects Model : Allows for evaluation of individual- and
group-level variation
72
Fixed and Mixed Models
Fixed Regression Model - ignoring individual variations
(speakers or words) may lead to Type I Error:
“a chance effect is mistaken for a real difference
between the populations”
Mixed Regression Model - prone to Type II Error:
“if speaker variation is at a high level, we cannot
discern small population effects without a large
number of speakers” (Johnson 2009, 2015)
73
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effect factor - “repeatable and a small number of levels”
Random-effect factor - “a non-repeatable random sample from
a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
event verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
74
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effect factor - “repeatable and a small number of levels”
Random-effect factor - “a non-repeatable random sample from
a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
event verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
74
Preparing for Mixed Model
1. Download continuousdata.csv
2. Upload this file on LVS
75
Data - Uploaded Dataset
76
Mixed Effect Modeling
77
NULL when the dependent variable is continuous
Fixed Effects - independent variables
Mixed Effect Modeling
78
Mixed Effects - group-internal variation
Regression Results
79
Regression Results
79
Interpretation - Random Effects
1. Standard Deviation: a measure of the variability for each
random effect (speakers and tokens)
2. Residual: random variation that is not due to speakers or
tokens (residual error)
80
Interpretation - Fixed Effects
1. Estimate/coefficient: reported in log-odds (negative or
positive)
2. P-value: tells you if the level is significant
81
Bonus - Word Clouds
82
Frequency Plot
83
Frequency Plot
84
Applied Projects with LVS
Publications
- Scrivner, Olga., and Díaz-Campos, M. 2016. Language
Variation Suite: A theoretical and methodological
contribution for linguistic data analysis. Proceedings of the
Linguistic Society of America, 1, 29–1
- Estrada, Monica. 2016. El tú no es de nosotros, es de otros
países: Usos del voseo y actitudes hacia él en el castellano
hondureño. Master Thesis. Louisiana State University
- Orozco, Rafael. 2018. El castellano del Caribe colombiano
en la ciudad de Nueva York: El uso variable d sujetos
pronominales. In Studies in Hispanic and Lusophone
Linguistics 11(1): 89–129
86
Students Perspective: Feedback
“This program makes analysis so much more accessible to
someone like myself who is new to statistical analysis”
“I realize it is actually a tool of analysis every linguist will
appreciate a lot for quantitative analysis”
87
Teachers Perspective
88
References I
Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University
Press
Bentivoglio, Paola and Mercedes Sedano. 1993. Investigación sociolingüística: sus métodos aplicados a una
experiencia venezolana. Boletín de Lingüística 8. 3-35
Díaz-Campos, Manuel and Stephanie Dickinson. In press. Using Statistics as a Tool in the Analysis of Sociolinguistic
Variation: A comparison of current and traditional methods. In Fernando Tejedo-Herrero (Ed.), Lusophone,
Galician, and Hispanic Linguistics: Bridging Frames and Traditions. Amsterdam: John Benjamins
Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber Randi Reppen (eds.), The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press
Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for Applied Linguistics
Scrivner, Olga., and Díaz-Campos, M. 2016. Language Variation Suite: A theoretical and methodological
contribution for linguistic data analysis. Proceedings of the Linguistic Society of America, 1, 29–1
Strobl, Carolin, James Malley, and Gerhard Tutz. An introduction to recursive partitioning: rationale, application,
and characteristics of classification and regression trees, bagging, and random forests. 2009. Psychological
methods 14: 323–348.
Tagliamonte, Sali, and R. Harald Baayen. 2012. Models, forests and trees of York English: Was/were variation as a
case study for statistical practice. Language Variation and Change 24: 135–178.
Tagliamonte, Sali. 2016. Quantitative analysis in language variation and change. In Fernando Tejedo-Herrero and
Sandro Sessarego (Eds.), Spanish Language and Sociolinguistic Analysis. Amsterdam: John Benjamins
89
Thank you!
90
Appendix
Appendix 1: Density
92
Number of bins can have a
disproportionate effect on visualization
Histogram
Density: a non-parametric model of the distribution of points based
on a smooth density estimate
http://scikit-learn.org/stable/modules/density.html
93
Appenix 2 - Data Modification
94
Adjust Data
Retain: Select data subset
Exclude: Exclude variables from a factor group
Recode: Combine and rename variables
Change class: Numeric → factor; factor → numeric
Transform: Apply log transformation to a specific column
ADJUSTED DATASET:
◦ Run - to apply all above changes
◦ Reset - to reset to the original dataset
95
Exclude: Emphatic Style
96
Adjusted Dataset
97
Adjusting Dataset
To revert to the original data, select RESET:
98
Appendix 3 - Model Comparison
99

Más contenido relacionado

La actualidad más candente

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...ranjit banshpal
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Jerrin George
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: ClusteringDeepak George
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396Nida Rashid
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
slides
slidesslides
slidesbutest
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsananth
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...IJAAS Team
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2uetian12
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 

La actualidad más candente (18)

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396
 
Data Mining
Data MiningData Mining
Data Mining
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
slides
slidesslides
slides
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

Similar a Language Variation Suite Toolkit: Interactive Tools for Sociolinguistic Analysis

Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmWater Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmIRJET Journal
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiersamreshkr19
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptPrabin Pandit
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overviewSoojung Hong
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...Till Blume
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutiongitikasingh2004
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisKaty Allen
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshopCesToronto
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 

Similar a Language Variation Suite Toolkit: Interactive Tools for Sociolinguistic Analysis (20)

Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmWater Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshop
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 

Más de Olga Scrivner

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxOlga Scrivner
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesOlga Scrivner
 
The power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systemsThe power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systemsOlga Scrivner
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderOlga Scrivner
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with PythonOlga Scrivner
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyOlga Scrivner
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash courseOlga Scrivner
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash courseOlga Scrivner
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...Olga Scrivner
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...Olga Scrivner
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationOlga Scrivner
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf WorkshopOlga Scrivner
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303Olga Scrivner
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and EducationOlga Scrivner
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersOlga Scrivner
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOlga Scrivner
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowOlga Scrivner
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataOlga Scrivner
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFOlga Scrivner
 
Building Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLBuilding Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLOlga Scrivner
 

Más de Olga Scrivner (20)

Engaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptxEngaging Students Competition and Polls.pptx
Engaging Students Competition and Polls.pptx
 
HICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning TechnologiesHICSS ATLT: Advances in Teaching and Learning Technologies
HICSS ATLT: Advances in Teaching and Learning Technologies
 
The power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systemsThe power of unstructured data: Recommendation systems
The power of unstructured data: Recommendation systems
 
Cognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use DisorderCognitive executive functions and Opioid Use Disorder
Cognitive executive functions and Opioid Use Disorder
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python
 
Call for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and TechnologyCall for paper Collaboration Systems and Technology
Call for paper Collaboration Systems and Technology
 
Jupyter machine learning crash course
Jupyter machine learning crash courseJupyter machine learning crash course
Jupyter machine learning crash course
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 
The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...The Impact of Language Requirement on Students' Performance, Retention, and M...
The Impact of Language Requirement on Students' Performance, Retention, and M...
 
If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...If a picture is worth a thousand words, Interactive data visualizations are w...
If a picture is worth a thousand words, Interactive data visualizations are w...
 
Introduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web ApplicationIntroduction to Interactive Shiny Web Application
Introduction to Interactive Shiny Web Application
 
Introduction to Overleaf Workshop
Introduction to Overleaf WorkshopIntroduction to Overleaf Workshop
Introduction to Overleaf Workshop
 
R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303R crash course for Business Analytics Course K303
R crash course for Business Analytics Course K303
 
Gender Disparity in Employment and Education
Gender Disparity in Employment and EducationGender Disparity in Employment and Education
Gender Disparity in Employment and Education
 
CrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for BeginnersCrashCourse: Python with DataCamp and Jupyter for Beginners
CrashCourse: Python with DataCamp and Jupyter for Beginners
 
Optimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with ShinyOptimizing Data Analysis: Web application with Shiny
Optimizing Data Analysis: Web application with Shiny
 
Data Analysis and Visualization: R Workflow
Data Analysis and Visualization: R WorkflowData Analysis and Visualization: R Workflow
Data Analysis and Visualization: R Workflow
 
Reproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid dataReproducible visual analytics of public opioid data
Reproducible visual analytics of public opioid data
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 
Building Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTMLBuilding Shiny Application Series - Layout and HTML
Building Shiny Application Series - Layout and HTML
 

Último

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Último (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Language Variation Suite Toolkit: Interactive Tools for Sociolinguistic Analysis

  • 1. Language Variation Suite Toolkit Olga Scrivner, Rafael Orozco, Manuel Díaz-Campos NWAV47, October 18, 2018 Indiana University and Louisiana State University
  • 3. What You Will Learn Part 1 Cross-disciplinary Methods in Sociolinguistics Part 2 Web Interface - a novel approach for interactive socio-analysis Part 3 Working with Structured Data: LVS Part 4 Working with Unstructured Data: ITMS 2
  • 4. Our Materials - Web Site https://languagevariationsuite. wordpress.com/ 3
  • 5. What We Need - Computer - WiFi - Smartphone (optional) - R and Rstudio (not really) 4
  • 6. What We Need - Computer - WiFi - Smartphone (optional) - R and Rstudio (not really) 4
  • 8. Our Objectives Goal: Optimize sociolinguistic analysis by using a variety of cross-disciplinary methods 1. Introduce a novel (socio-)linguistic toolkit for research and as a teaching tool 2. Develop practical skills via interactive and visual interface 3. Understand and interpret advanced statistical and data mining models 6
  • 11. Cross-disciplinary Methods: A Closer Look Random Forest - a more stable than stepwise variable selection (Tagliamonte and Baayen,2012, 25) Conditional Tree - a hierarchical tree of significant factors and their relationship with other predictors 9
  • 12. Cross-disciplinary Methods: A Closer Look Clustering - a grouping of variables or individuals or documents (Gries and Hilpert, 2008, 69) Topic Modeling - a discovery of language patterns and word usage over time and across genres (Blei, 2012, 78) 10
  • 13. How To Unify All Methods: Data Science Workflow 1. Import data into R: read_csv(), read_line() [even praat files!] 2. Tidy data - pre-processing 3. Transform with dplyr [select, subset, recode...] 4. Visualize with ggplot and plotly 5. Model with glm, lda, randomForest... 11 Tidyverse - a system of R packages for data manipulation, exploration, and visualization that share a common design philosophy (Wickham and Grolemund, 2017)
  • 14. Sociolinguistic Data - Data Analytics “Analytics is the critical technology needed to bring value out of data” (Anonymous) 12
  • 15. Sociolinguistic Data - Visual Analytics “The science of analytical reasoning facilitated by visual interactive interfaces” (Thomas and Cook, 2005) “Visual analytics integrates new computational and theory-based tools with innovative interactive techniques and visual representations to enable human-information discourse” (Thomas and Cook, 2005) PositionSentence p < 0.001 1 ind, pre post Heaviness p = 0.003 2 ≤ 1 > 1 Period p < 0.001 3 ≤ 1 > 1 Node 4 (n = 81) VOOV 0 0.2 0.4 0.6 0.8 1 Node 5 (n = 119) VOOV 0 0.2 0.4 0.6 0.8 1 Node 6 (n = 181) VOOV 0 0.2 0.4 0.6 0.8 1 Period p < 0.001 7 ≤ 2 > 2 Node 8 (n = 221) VOOV 0 0.2 0.4 0.6 0.8 1 Focus p < 0.001 9 cf nf Node 10 (n = 66) VOOV 0 0.2 0.4 0.6 0.8 1 Main_Verb_Structure p < 0.001 11 ACIOther, Restructuring Node 12 (n = 43) VOOV 0 0.2 0.4 0.6 0.8 1 Node 13 (n = 265) VOOV 0 0.2 0.4 0.6 0.8 1 13
  • 17. What is LVS? Language Variation Suite It is a Shiny web application designed for data analysis in sociolinguistic research. It can be used for: Processing spreadsheet data Reporting in tables and graphs Analyzing means, regression, conditional trees ... (and much more) 15
  • 18. Background LVS is built in R using Shiny package: 1. R - a free programming language for statistical computing and graphics 2. Shiny App - a web application framework for R Computational power of R + Web interactivity 16
  • 20. Workspace Browser Chrome, Firefox, Safari - recommendable Explorer may cause instability issues Accessibility PC, Mac, Linux ◦ Data files will be uploaded from any location on your computer Smart Phone ◦ Data files must be on a cloud platform connected to your phone account (e.g. dropbox) 18
  • 21. Server Since LVS is hosted on a server, Shiny idle time-out settings may stop application when it is left inactive (it will grey out). Solution: Click reload and re-upload your csv file 19
  • 23. Data Preparation Important things to consider before data entry: File format: ◦ Comma separated value (CSV) - faster processing ◦ Excel format will slow processing Column names should not contain spaces ◦ Permitted: non-accented characters, numbers, underscore, hyphen, and period One column must contain your dependent variable The rest of the columns contain independent variables 21
  • 24. Terminology Review a. Categorical/Nominal - non-numerical data with two values ◦ yes - no; deletion - retention; perfective - imperfective b. Continuous - numerical data ◦ duration, age, chronological period c. Multinomial - non-numerical data with three or more values ◦ deletion - aspiration - retention d. Ordinal - scale 22
  • 25. Test Types Dependent Independent Test Continuous Continous t-test Categorical Categorical Chi-square Continuous Multiple Categorical/ Linear Regression Continuous Categorical Multiple Categorical/ Logistic Regression Continuous Binary or Multinomial (Dickinson, 2014) 23
  • 28. Workshop Files https://languagevariationsuite.wordpress.com/ 1. categoricaldata.csv: categorical dependent - Labov New York 1966 study 2. continuousdata.csv: continuous dependent - Intervocalic /d/ in Caracas corpus (Díaz-Campos and Gradoville, 2011; Scrivner and Díaz-Campos, 2016) 3. LVS web site: www.languagevariationsuite.com 25
  • 30. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential Statistics ◦ Modeling, regression, conditional trees, random forest 27
  • 31. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential Statistics ◦ Modeling, regression, conditional trees, random forest 27
  • 33. Excel Format 1. Slow processing 2. Requires the name of your excel sheet 29
  • 34. TIPS: Save Your Excel as CSV Format To optimize speed - Save as CSV prior upload 30
  • 36. Uploaded Dataset The data content is imported as a table and allows for sorting columns. 32
  • 37. Summary Summary provides a quantitative summary for each variable, e.g. frequency count, mean, median. 33
  • 38. Data Structure 1. Total number of observations (rows) 2. Number of variables (columns) 3. Variable types ◦ Factor - categorical values ◦ Num - numeric values (0.95, 1.05) ◦ Int - integer values (1, 2, 3) 34
  • 39. Cross Tabulation Cross-tabulation examines the relationship between variables. 35
  • 41. Cross-Tabulation Output Raw frequency / Proportion by column / Proportion across row 37
  • 44. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential statistics ◦ Modeling, regression, varbrul analysis, conditional trees, random forest 40
  • 45. Adjusting Browser - Layout Shiny pages are fluid and reactive. 41
  • 50. Cluster Plot Classification of data into sub-groups is based on pairwise similarities Groups are clustered in the form of a tree-like dendrogram 46
  • 52. Cluster Plot Saks (upper middle-class store), Macy’s (middle-class store), Kleins (working-class) 48
  • 56. Language Variation Suite - Structure 1. Data ◦ Upload file, data summary, adjust data, cross tabulation 2. Visual Analysis ◦ Plotting, cluster classification 3. Inferential statistics ◦ Modeling, regression, varbrul analysis, conditional trees, random forest 52
  • 57. How to Create a Regression Model Step 1 Modeling - create a model with dependent and independent variables Step 2 Regression - specify the type of regression (fixed, mixed) and type of dependent variable (binary, continuous, multinomial) Step 3 Stepwise Regression - compare models (Log-likelihood, AIC, BIC) Step 4 Conditional Trees - apply non-parametric tests to the model 53
  • 59. Modeling 54 We are interested in RETENTION = Application
  • 60. Regression Types Model a.) Fixed effect b.) Mixed effect - individual speaker/token variation (within group) Type of Dependent Variable a.) Binary/categorical (only two values) b.) Continuous (numeric) c.) Multinomial - categorical with more than two values 55
  • 64. Interpretation Deletion is the reference value Positive coefficient - positive effect Negative coefficient - negative effect 58
  • 65. Interpretation - RETENTION Lexical item Fourth has a negative effect on retention compared to Floor and is significant Normal style has a slightly negative effect on retention but its coefficient is not significant Macy’s and Saks have a positive and significant effect on retention. Saks (upper middle class store) is more significant than Macy’s (middle class store) http://www.free-online-calculator-use.com/scientific-notation-converter.html 59
  • 66. Interpretation - RETENTION Lexical item Fourth has a negative effect on retention compared to Floor and is significant Normal style has a slightly negative effect on retention but its coefficient is not significant Macy’s and Saks have a positive and significant effect on retention. Saks (upper middle class store) is more significant than Macy’s (middle class store) http://www.free-online-calculator-use.com/scientific-notation-converter.html 59 exponential notation: 1.48e-8 0.0000000148
  • 68. Conditional Tree Conditional tree: a simple non-parametric regression analysis, commonly used in social and psychological studies Linear regression: all information is combined linearly Conditional tree regression: visual splitting to capture interaction between variables Recursive splitting (tree branches) 61
  • 69. Conditional Tree - Tagliamonte and Baayen (2012) 1. The distribution of was/were is split in two groups by individuals. 2. The variant were occurs significantly more frequently with the first group. 62
  • 70. Conditional Tree - Tagliamonte and Baayen (2012) 1. Polarity is relevant to the second group of individuals. 2. The variant were occurs significantly more often with negative polarity 63
  • 71. Conditional Tree - Tagliamonte and Baayen (2012) 1. Affirmative Polarity is conditioned by Age. 2. The variant was is produced significantly more often by Individuals of 46 and younger. 64
  • 73. Conditional Tree 1. Store is the most significant factor for R-use ◦ Kleins (working class store) - more R-deletion 2. R-use in Macy’s and Saks is conditioned by lexical item: ◦ Floor shows more R-retention than Fourth 3. Style is not significant 66
  • 75. Random Forest 1. Variable importance for predictors 2. Robust technique with small n large p data 3. All predictors considered jointly (allows for inclusion of correlated factors) 68
  • 77. Random Forest 1. Store is the most important predictor 2. Lexical Item is the second predictor 3. Style is irrelevant: close to zero and red dotted line (cut-off value). 70
  • 79. Fixed and Mixed Models Fixed Effects Model : All predictors are treated independent. Underlying assumption - no group-internal variation between speakers or tokens Mixed Effects Model : Allows for evaluation of individual- and group-level variation 72
  • 80. Fixed and Mixed Models Fixed Regression Model - ignoring individual variations (speakers or words) may lead to Type I Error: “a chance effect is mistaken for a real difference between the populations” Mixed Regression Model - prone to Type II Error: “if speaker variation is at a high level, we cannot discern small population effects without a large number of speakers” (Johnson 2009, 2015) 73
  • 81. Mixed Effect Regression Mixed Model = fixed effects + random effects Fixed-effect factor - “repeatable and a small number of levels” Random-effect factor - “a non-repeatable random sample from a larger population” (Wieling 2012) walk, sleep, study, finish, eat, etc event verb, stative verb speaker1, speaker3, speaker3, etc male, female 74
  • 82. Mixed Effect Regression Mixed Model = fixed effects + random effects Fixed-effect factor - “repeatable and a small number of levels” Random-effect factor - “a non-repeatable random sample from a larger population” (Wieling 2012) walk, sleep, study, finish, eat, etc event verb, stative verb speaker1, speaker3, speaker3, etc male, female 74
  • 83. Preparing for Mixed Model 1. Download continuousdata.csv 2. Upload this file on LVS 75
  • 84. Data - Uploaded Dataset 76
  • 85. Mixed Effect Modeling 77 NULL when the dependent variable is continuous Fixed Effects - independent variables
  • 86. Mixed Effect Modeling 78 Mixed Effects - group-internal variation
  • 89. Interpretation - Random Effects 1. Standard Deviation: a measure of the variability for each random effect (speakers and tokens) 2. Residual: random variation that is not due to speakers or tokens (residual error) 80
  • 90. Interpretation - Fixed Effects 1. Estimate/coefficient: reported in log-odds (negative or positive) 2. P-value: tells you if the level is significant 81
  • 91. Bonus - Word Clouds 82
  • 95. Publications - Scrivner, Olga., and Díaz-Campos, M. 2016. Language Variation Suite: A theoretical and methodological contribution for linguistic data analysis. Proceedings of the Linguistic Society of America, 1, 29–1 - Estrada, Monica. 2016. El tú no es de nosotros, es de otros países: Usos del voseo y actitudes hacia él en el castellano hondureño. Master Thesis. Louisiana State University - Orozco, Rafael. 2018. El castellano del Caribe colombiano en la ciudad de Nueva York: El uso variable d sujetos pronominales. In Studies in Hispanic and Lusophone Linguistics 11(1): 89–129 86
  • 96. Students Perspective: Feedback “This program makes analysis so much more accessible to someone like myself who is new to statistical analysis” “I realize it is actually a tool of analysis every linguist will appreciate a lot for quantitative analysis” 87
  • 98. References I Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University Press Bentivoglio, Paola and Mercedes Sedano. 1993. Investigación sociolingüística: sus métodos aplicados a una experiencia venezolana. Boletín de Lingüística 8. 3-35 Díaz-Campos, Manuel and Stephanie Dickinson. In press. Using Statistics as a Tool in the Analysis of Sociolinguistic Variation: A comparison of current and traditional methods. In Fernando Tejedo-Herrero (Ed.), Lusophone, Galician, and Hispanic Linguistics: Bridging Frames and Traditions. Amsterdam: John Benjamins Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber Randi Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for Applied Linguistics Scrivner, Olga., and Díaz-Campos, M. 2016. Language Variation Suite: A theoretical and methodological contribution for linguistic data analysis. Proceedings of the Linguistic Society of America, 1, 29–1 Strobl, Carolin, James Malley, and Gerhard Tutz. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. 2009. Psychological methods 14: 323–348. Tagliamonte, Sali, and R. Harald Baayen. 2012. Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24: 135–178. Tagliamonte, Sali. 2016. Quantitative analysis in language variation and change. In Fernando Tejedo-Herrero and Sandro Sessarego (Eds.), Spanish Language and Sociolinguistic Analysis. Amsterdam: John Benjamins 89
  • 101. Appendix 1: Density 92 Number of bins can have a disproportionate effect on visualization
  • 102. Histogram Density: a non-parametric model of the distribution of points based on a smooth density estimate http://scikit-learn.org/stable/modules/density.html 93
  • 103. Appenix 2 - Data Modification 94
  • 104. Adjust Data Retain: Select data subset Exclude: Exclude variables from a factor group Recode: Combine and rename variables Change class: Numeric → factor; factor → numeric Transform: Apply log transformation to a specific column ADJUSTED DATASET: ◦ Run - to apply all above changes ◦ Reset - to reset to the original dataset 95
  • 107. Adjusting Dataset To revert to the original data, select RESET: 98
  • 108. Appendix 3 - Model Comparison 99