SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Dr Nisha Arora
Data Structure in R
https://www.linkedin.com/in/drnishaarora/
Contents
2
Data Structure in R
Vector
List
Factor
Matrix
Data Frame
Array
Subsetting & Basic Operations
3
Coercion
All elements of a vector must be the same type, so when we
attempt to combine different types they will be coerced to the
most flexible type.
Types from least to most flexible are:
.
Logical
Integer
Double/ Numeric
Character
4
Coercion
When a logical vector is coerced to an integer or double, TRUE
becomes 1 and FALSE becomes 0
x <- c(FALSE, FALSE, TRUE); as.numeric(x)
Total number of TRUEs
sum(x)
Proportion that are TRUE
mean(x)
5
Coercion in R
✓ To forcefully coerce a variable class into other, following
functions are used
as.numeric(), as.logical(), etc.
In Python, we call it ‘typecasting’
https://youtu.be/FJ6IkFycCdA
6
Data Structure in R
✓ Vector
The basic one dimensional data structure in R is the vector
✓ List
Lists are generic vectors which can contain any data type or
data structure
✓ Matrix
The basic two dimensional data structure in R is the matrix
Note: A variable with a single value is known as scalar. In R a
scalar is a vector of length 1
7
Data Structure in R
✓ Factor
A factor is a vector that can contain only predefined values, and
is used to store categorical data
✓ Data Frame
A data frame is a list of equal-length vectors. This makes it a 2-
dimensional structure, so it shares properties of both the matrix
and the list.
✓ Array
An array is an n-dimensional data structure. Matrix is an special
case of array for 2 dimensions.
We will discuss ‘tibble’ from tidyverse in next lession
8
Vectors in R
To create vectors in R using concatenation function
num_var <- c(1, 2, 4.5)
Use the L suffix to get an integer rather than a double
int_var <- c(13L, 0L, 10L)
Use TRUE and FALSE (or T and F) to create logical vectors
log_var <- c(TRUE, FALSE, T, F)
Use double or single quotation to create character vector
chr_var <- c(“abc", “123")
Vectors can also be created by using sequence or scan function
9
Vectors in R
To name a vector
# Assigning names directly
x <- c(Mon = 37, Tue = 41.4, Wed = 43.2)
# Using names() function
x <- c(78, 86, 89); names(x) <- c(“chem", “phy", “math")
# Using setNames() function
x <- setNames(1:3, c("a", "b", "c"))
10
Vector Subsetting
x = c(11,42,23,14,55);
names(x) = c('ajay', 'ravi', 'john', 'anjali', 'namrata'); x
x[2]; x[1:3]; x[5]; x[7]
# x[n] gives 'nth' element of vector x, there are only 6 elements,
so x[7] is NA
x['ajay']; x[c('ravi', 'namrata’)]
# To select elements by names
11
List in R
Lists are different from vectors because their elements can be of
any type, including lists.
We can construct lists by using list() instead of c()
x <- list(1:4, "abc", c(T, T, F), c(2.3, 5.9))
12
List Subsetting
https://stackoverflow.com/a/49699955/5114585
Three ways:
Using single square
bracket ‘[ ]’
Using Double square
bracket ‘[[ ]]’
Calling ‘$’ by using
names
13
Matrix in R
To create matrix in R
x = matrix(1:9, nrow = 3, ncol = 3)
x = matrix (1:9, 3, 3) # Alternate way
To create a matrix by using by row
z = matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)
# By default byrow is FALSE, so matrix is created by column
a <- matrix(1:9, byrow=TRUE, nrow=3) # Alternate way
14
Matrix in R
To create matrix by using cbind() command
one <- c(1,0,0)
two <- c(0,1,0)
three <- c(0,0,1)
b <- cbind(one, two, three)
To create a matrix by using rbind() command
c <- rbind(one, two, three)
15
Matrix in R
To assign names to columns and rows of matrix
x = cbind(c(78, 85, 95), c(99, 91, 85), c(67, 62, 63))
colnames(x) = c(“Jan", ‘Feb', “Mar“)
rownames(x) = c(“product1”, ‘product2’, ‘product3’)
Other useful commands
dim(x); head(x); nrow(x); ncol(x); attributes(x)
rowSums(x); colSums(x)
16
Matrix Subsetting
To find sub matrices of a given matrix
x <- matrix(1:6, 2, 3)
x[1, 2] # Element of first row, second column [single element]
x[2, 1] # Element of second row, first column [single element]
x[2, ] # Matrix of all the elements of second row [matrix]
x[, 1] # Matrix of all the elements of first column [matrix]
x[1:2, 3] # Elements of first & second row for third column only
17
Matrix Subsetting
To find sub matrices of a given matrix
x <- matrix(1:6, 2, 3)
By default, when a single element of a matrix is retrieved, it is returned
as a vector of length 1 rather than a 1 × 1 matrix.
This behaviour can be turned off by setting drop = FALSE.
x[1, 2] # Single element
x[1, 2, drop = FALSE] # Matrix of one row & one column
18
Matrix Subsetting
To find sub matrices of a given matrix
x <- matrix(1:6, 2, 3)
Similarly, sub-setting a single column or a single row results in a
vector, not a matrix (by default).
This behaviour can be turned off by setting drop = FALSE.
x[1, ] # Single row
x[1, , drop = FALSE] # Matrix of one row & one column
19
Matrix Subsetting
To find sub matrices of a given matrix
x = cbind(c(78, 85, 95), c(99, 91, 85), c(67, 62, 63))
x[ , 2]
x[ ,2:3]
x[ 2, 3]
x[1:2, 3]
20
Factors in R
They are used for handling categorical variable, e.g., the ones
that are nominal or ordered categorical variables.
For example,
Male, Female Nominal categorical
Low, Medium, High Ordinal categorical
21
Factors in R
To create a factor in R using factor()
gender_vector <- c("Male", "Female", "Female", "Male", "Male")
factor_gender_vector <- factor(gender_vector)
Also, try levels(factor_gender_vector)
To change the levels of factor
levels(factor_gender_vector) = c(("F", "M"))
Other useful commands
summary(factor_gender_vector); table(factor_gender_vector)
22
Data frames in R
A data frame is the most common way of storing data in R, and if used
systematically makes data analysis easier.
✓ Similar to tables (databases), dataset (Excel/SAS/SPSS) etc.
✓ Consists of columns of different types; More general than a matrix
✓ Columns – Variables; Rows – Observations
✓ Convenient to hold all the data required for a data analysis
✓ They are represented as a special type of list where every element of
the list has to have the same length
✓ Data frames also have a special attribute called row.names
23
Data frames in R
✓ Data frames are, well, tables (like in any spreadsheet program).
✓ In data frames variables are typically in the columns, and cases in
the rows.
✓ Columns can have mixed types of data; some can contain
numeric, yet others text.
✓ If all columns would contain only character or numerical data,
then the data can also be saved in a matrix (those are faster to
operate on).
We will also discuss ‘tibble’ in the course.
24
Data frames in R
To create a data frame in R
Example_1:
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
Example_2:
height <- c(180,175,190)
weight <- c(75,82,88)
name <- c("Anil","Ankit","Sunil")
data <- data.frame(name, heigth, weight)
25
Data frames in R
To combine data frames in R
Example_1: using cbind()
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
cbind(df, data.frame(z = 3:1))
Example_2: using rbind()
rbind(df, data.frame(x = 10, y = "z"))
26
Data frames in R
To combine data frames in R
Example_1: using cbind()
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
cbind(df, data.frame(z = 3:1))
Example_2: using rbind()
rbind(df, data.frame(x = 10, y = "z"))
27
Data Type Conversions
Use is.foo to test for data type foo.
Returns TRUE or FALSE
Use as.foo to explicitly convert it.
Examples:
is.numeric(), is.character(), is.vector(), is.matrix(), is.data.frame()
as.numeric(), as.character(), as.vector(), as.matrix(), as.data.frame)
28
Handling of missing values
X <- c(1:8,NA)
✓ Removing missing vlaues
mean(X, na.rm = T) or mean(X ,na.rm=TRUE)
✓ To check for the location of missing values within a vector
which(is.na(X))
✓ To assign this a large number, say, 999
X[which(is.na(X))] = 999
For more code: follow me on GitHub
29
Handling of missing values
x <- c(1, 2, NA, 4, NA, 5)
✓ Identify missing values
bad <- is.na(x)
✓ To remove missing values
x[!bad]
30
Handling of missing values
x <- c(1, 2, NA, 4, NA, 5); y <- c("a", "b", NA, "d", "e", NA)
df = data.frame(x,y)
✓ To take the subset of data frame with no missing value
good = complete.cases(x,y); good
✓ To take the subset of vector x with no missing value
x[good]
✓ To take the subset of vector y with no missing value
y[good]
Books
31
✓ Crowley, M. J. (2007). The R Book. Chichester, New
England: John Wiley & Sons, Ltd.
✓ An Introduction to R by W. N. Venables, D. M. Smith and
the R Core Team
✓ R in a Nutshell by Joseph Adler: O’Reilly
✓ Teetor, P. (2011). R cookbook. Sebastopol, CA: O’Reilly
Media Inc.
Books
32
✓ Bio Statistics - https://www.middleprofessor.com/files/applied-
biostatistics_bookdown/_book/
✓ Advanced R - https://adv-r.hadley.nz/
✓ Data Visualization - https://rkabacoff.github.io/datavis/
✓ R for Data Science - https://r4ds.had.co.nz/index.html
✓ Data Exploration & Analysis -
https://bookdown.org/mikemahoney218/IDEAR/
✓ https://bookdown.org/mikemahoney218/LectureBook/
Blogs & Communities
33
http://www.r-bloggers.com/
http://www.inside-r.org/blogs
https://blog.rstudio.org/
http://www.statmethods.net/
http://stats.stackexchange.com
https://www.researchgate.net
https://www.quora.com
https://github.com
Learn To Code
34
https://www.datacamp.com/
https://www.dataquest.io/
https://www.codeschool.com/
https://guide.freecodecamp.org/r/
https://www.hackerrank.com/contests/co/
https://www.hackerearth.com/practice/
https://hackernoon.com/tagged/r
https://rpubs.com/
35
Reach Out to Me
http://stats.stackexchange.com/users/79100/learner
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
https://github.com/arora123/
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw
https://www.linkedin.com/in/drnishaarora/
Dr.aroranisha@gmail.com
Thank You

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Data Visualization With R
Data Visualization With RData Visualization With R
Data Visualization With R
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 

Similar a 3 Data Structure in R

R Programming.pptx
R Programming.pptxR Programming.pptx
R Programming.pptx
kalai75
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 

Similar a 3 Data Structure in R (20)

R교육1
R교육1R교육1
R교육1
 
2 data structure in R
2 data structure in R2 data structure in R
2 data structure in R
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdf
 
@ R reference
@ R reference@ R reference
@ R reference
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
R language introduction
R language introductionR language introduction
R language introduction
 
R Programming.pptx
R Programming.pptxR Programming.pptx
R Programming.pptx
 
Day 1d R structures & objects: matrices and data frames.pptx
Day 1d   R structures & objects: matrices and data frames.pptxDay 1d   R structures & objects: matrices and data frames.pptx
Day 1d R structures & objects: matrices and data frames.pptx
 
Language R
Language RLanguage R
Language R
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
R programming
R programmingR programming
R programming
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
Files,blocks and functions in R
Files,blocks and functions in RFiles,blocks and functions in R
Files,blocks and functions in R
 

Más de Dr Nisha Arora

Más de Dr Nisha Arora (15)

1. python for data science
1. python for data science1. python for data science
1. python for data science
 
What do corporates look for in a data science candidate?
What do corporates look for in a data science candidate?What do corporates look for in a data science candidate?
What do corporates look for in a data science candidate?
 
Statistical Inference /Hypothesis Testing
Statistical Inference /Hypothesis Testing Statistical Inference /Hypothesis Testing
Statistical Inference /Hypothesis Testing
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
 
2 data types and operators in r
2 data types and operators in r2 data types and operators in r
2 data types and operators in r
 
My talk_ Using data to get business insights
My talk_ Using data to get business insightsMy talk_ Using data to get business insights
My talk_ Using data to get business insights
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Cluster analysis using spss
Cluster analysis using spssCluster analysis using spss
Cluster analysis using spss
 
5 mistakes you might be making as a teacher
5 mistakes you might be making as a teacher5 mistakes you might be making as a teacher
5 mistakes you might be making as a teacher
 
Data visualization & Story Telling with Data
Data visualization & Story Telling with DataData visualization & Story Telling with Data
Data visualization & Story Telling with Data
 
1 machine learning demystified
1 machine learning demystified1 machine learning demystified
1 machine learning demystified
 
1 introduction to data science
1 introduction to data science1 introduction to data science
1 introduction to data science
 
1 installing & Getting Started with R
1 installing & Getting Started with R1 installing & Getting Started with R
1 installing & Getting Started with R
 

Último

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Último (20)

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 

3 Data Structure in R

  • 1. Dr Nisha Arora Data Structure in R https://www.linkedin.com/in/drnishaarora/
  • 2. Contents 2 Data Structure in R Vector List Factor Matrix Data Frame Array Subsetting & Basic Operations
  • 3. 3 Coercion All elements of a vector must be the same type, so when we attempt to combine different types they will be coerced to the most flexible type. Types from least to most flexible are: . Logical Integer Double/ Numeric Character
  • 4. 4 Coercion When a logical vector is coerced to an integer or double, TRUE becomes 1 and FALSE becomes 0 x <- c(FALSE, FALSE, TRUE); as.numeric(x) Total number of TRUEs sum(x) Proportion that are TRUE mean(x)
  • 5. 5 Coercion in R ✓ To forcefully coerce a variable class into other, following functions are used as.numeric(), as.logical(), etc. In Python, we call it ‘typecasting’ https://youtu.be/FJ6IkFycCdA
  • 6. 6 Data Structure in R ✓ Vector The basic one dimensional data structure in R is the vector ✓ List Lists are generic vectors which can contain any data type or data structure ✓ Matrix The basic two dimensional data structure in R is the matrix Note: A variable with a single value is known as scalar. In R a scalar is a vector of length 1
  • 7. 7 Data Structure in R ✓ Factor A factor is a vector that can contain only predefined values, and is used to store categorical data ✓ Data Frame A data frame is a list of equal-length vectors. This makes it a 2- dimensional structure, so it shares properties of both the matrix and the list. ✓ Array An array is an n-dimensional data structure. Matrix is an special case of array for 2 dimensions. We will discuss ‘tibble’ from tidyverse in next lession
  • 8. 8 Vectors in R To create vectors in R using concatenation function num_var <- c(1, 2, 4.5) Use the L suffix to get an integer rather than a double int_var <- c(13L, 0L, 10L) Use TRUE and FALSE (or T and F) to create logical vectors log_var <- c(TRUE, FALSE, T, F) Use double or single quotation to create character vector chr_var <- c(“abc", “123") Vectors can also be created by using sequence or scan function
  • 9. 9 Vectors in R To name a vector # Assigning names directly x <- c(Mon = 37, Tue = 41.4, Wed = 43.2) # Using names() function x <- c(78, 86, 89); names(x) <- c(“chem", “phy", “math") # Using setNames() function x <- setNames(1:3, c("a", "b", "c"))
  • 10. 10 Vector Subsetting x = c(11,42,23,14,55); names(x) = c('ajay', 'ravi', 'john', 'anjali', 'namrata'); x x[2]; x[1:3]; x[5]; x[7] # x[n] gives 'nth' element of vector x, there are only 6 elements, so x[7] is NA x['ajay']; x[c('ravi', 'namrata’)] # To select elements by names
  • 11. 11 List in R Lists are different from vectors because their elements can be of any type, including lists. We can construct lists by using list() instead of c() x <- list(1:4, "abc", c(T, T, F), c(2.3, 5.9))
  • 12. 12 List Subsetting https://stackoverflow.com/a/49699955/5114585 Three ways: Using single square bracket ‘[ ]’ Using Double square bracket ‘[[ ]]’ Calling ‘$’ by using names
  • 13. 13 Matrix in R To create matrix in R x = matrix(1:9, nrow = 3, ncol = 3) x = matrix (1:9, 3, 3) # Alternate way To create a matrix by using by row z = matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE) # By default byrow is FALSE, so matrix is created by column a <- matrix(1:9, byrow=TRUE, nrow=3) # Alternate way
  • 14. 14 Matrix in R To create matrix by using cbind() command one <- c(1,0,0) two <- c(0,1,0) three <- c(0,0,1) b <- cbind(one, two, three) To create a matrix by using rbind() command c <- rbind(one, two, three)
  • 15. 15 Matrix in R To assign names to columns and rows of matrix x = cbind(c(78, 85, 95), c(99, 91, 85), c(67, 62, 63)) colnames(x) = c(“Jan", ‘Feb', “Mar“) rownames(x) = c(“product1”, ‘product2’, ‘product3’) Other useful commands dim(x); head(x); nrow(x); ncol(x); attributes(x) rowSums(x); colSums(x)
  • 16. 16 Matrix Subsetting To find sub matrices of a given matrix x <- matrix(1:6, 2, 3) x[1, 2] # Element of first row, second column [single element] x[2, 1] # Element of second row, first column [single element] x[2, ] # Matrix of all the elements of second row [matrix] x[, 1] # Matrix of all the elements of first column [matrix] x[1:2, 3] # Elements of first & second row for third column only
  • 17. 17 Matrix Subsetting To find sub matrices of a given matrix x <- matrix(1:6, 2, 3) By default, when a single element of a matrix is retrieved, it is returned as a vector of length 1 rather than a 1 × 1 matrix. This behaviour can be turned off by setting drop = FALSE. x[1, 2] # Single element x[1, 2, drop = FALSE] # Matrix of one row & one column
  • 18. 18 Matrix Subsetting To find sub matrices of a given matrix x <- matrix(1:6, 2, 3) Similarly, sub-setting a single column or a single row results in a vector, not a matrix (by default). This behaviour can be turned off by setting drop = FALSE. x[1, ] # Single row x[1, , drop = FALSE] # Matrix of one row & one column
  • 19. 19 Matrix Subsetting To find sub matrices of a given matrix x = cbind(c(78, 85, 95), c(99, 91, 85), c(67, 62, 63)) x[ , 2] x[ ,2:3] x[ 2, 3] x[1:2, 3]
  • 20. 20 Factors in R They are used for handling categorical variable, e.g., the ones that are nominal or ordered categorical variables. For example, Male, Female Nominal categorical Low, Medium, High Ordinal categorical
  • 21. 21 Factors in R To create a factor in R using factor() gender_vector <- c("Male", "Female", "Female", "Male", "Male") factor_gender_vector <- factor(gender_vector) Also, try levels(factor_gender_vector) To change the levels of factor levels(factor_gender_vector) = c(("F", "M")) Other useful commands summary(factor_gender_vector); table(factor_gender_vector)
  • 22. 22 Data frames in R A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. ✓ Similar to tables (databases), dataset (Excel/SAS/SPSS) etc. ✓ Consists of columns of different types; More general than a matrix ✓ Columns – Variables; Rows – Observations ✓ Convenient to hold all the data required for a data analysis ✓ They are represented as a special type of list where every element of the list has to have the same length ✓ Data frames also have a special attribute called row.names
  • 23. 23 Data frames in R ✓ Data frames are, well, tables (like in any spreadsheet program). ✓ In data frames variables are typically in the columns, and cases in the rows. ✓ Columns can have mixed types of data; some can contain numeric, yet others text. ✓ If all columns would contain only character or numerical data, then the data can also be saved in a matrix (those are faster to operate on). We will also discuss ‘tibble’ in the course.
  • 24. 24 Data frames in R To create a data frame in R Example_1: df <- data.frame(x = 1:3, y = c("a", "b", "c")) Example_2: height <- c(180,175,190) weight <- c(75,82,88) name <- c("Anil","Ankit","Sunil") data <- data.frame(name, heigth, weight)
  • 25. 25 Data frames in R To combine data frames in R Example_1: using cbind() df <- data.frame(x = 1:3, y = c("a", "b", "c")) cbind(df, data.frame(z = 3:1)) Example_2: using rbind() rbind(df, data.frame(x = 10, y = "z"))
  • 26. 26 Data frames in R To combine data frames in R Example_1: using cbind() df <- data.frame(x = 1:3, y = c("a", "b", "c")) cbind(df, data.frame(z = 3:1)) Example_2: using rbind() rbind(df, data.frame(x = 10, y = "z"))
  • 27. 27 Data Type Conversions Use is.foo to test for data type foo. Returns TRUE or FALSE Use as.foo to explicitly convert it. Examples: is.numeric(), is.character(), is.vector(), is.matrix(), is.data.frame() as.numeric(), as.character(), as.vector(), as.matrix(), as.data.frame)
  • 28. 28 Handling of missing values X <- c(1:8,NA) ✓ Removing missing vlaues mean(X, na.rm = T) or mean(X ,na.rm=TRUE) ✓ To check for the location of missing values within a vector which(is.na(X)) ✓ To assign this a large number, say, 999 X[which(is.na(X))] = 999 For more code: follow me on GitHub
  • 29. 29 Handling of missing values x <- c(1, 2, NA, 4, NA, 5) ✓ Identify missing values bad <- is.na(x) ✓ To remove missing values x[!bad]
  • 30. 30 Handling of missing values x <- c(1, 2, NA, 4, NA, 5); y <- c("a", "b", NA, "d", "e", NA) df = data.frame(x,y) ✓ To take the subset of data frame with no missing value good = complete.cases(x,y); good ✓ To take the subset of vector x with no missing value x[good] ✓ To take the subset of vector y with no missing value y[good]
  • 31. Books 31 ✓ Crowley, M. J. (2007). The R Book. Chichester, New England: John Wiley & Sons, Ltd. ✓ An Introduction to R by W. N. Venables, D. M. Smith and the R Core Team ✓ R in a Nutshell by Joseph Adler: O’Reilly ✓ Teetor, P. (2011). R cookbook. Sebastopol, CA: O’Reilly Media Inc.
  • 32. Books 32 ✓ Bio Statistics - https://www.middleprofessor.com/files/applied- biostatistics_bookdown/_book/ ✓ Advanced R - https://adv-r.hadley.nz/ ✓ Data Visualization - https://rkabacoff.github.io/datavis/ ✓ R for Data Science - https://r4ds.had.co.nz/index.html ✓ Data Exploration & Analysis - https://bookdown.org/mikemahoney218/IDEAR/ ✓ https://bookdown.org/mikemahoney218/LectureBook/
  • 35. 35 Reach Out to Me http://stats.stackexchange.com/users/79100/learner https://www.researchgate.net/profile/Nisha_Arora2/contributions https://www.quora.com/profile/Nisha-Arora-9 https://github.com/arora123/ https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw https://www.linkedin.com/in/drnishaarora/ Dr.aroranisha@gmail.com