SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
Introduction to Data Analysis and Graphics2
Introduction to Data Analysis and Graphics2
Hellen Gakuruh
2017-03-07
Session Two
Vector and Assignment, Data Objects and Data Importation
Outline
By the end of this session we will have knowledge on:
• Vectors and Assignment
• Data types
• Data structure and
• Importing data into R
Vector and Assignment
• Simplest data structure in R is a vector. From a data point of view, a
vector is collection of elements. These elements can be numeric values,
alphabetical characters, logical, dates and time values.
• Vectors are created with function “c” which means “concatenate”. e.g. a
numerical vector c(1, 5, 6, 8)
• Thee vectors can be named by using an assignment operator “<-” or
function “assign()”. e.g. to assign vector c(1, 5, 6, 8) to name “num”;
num <- c(1, 5, 6, 8) or assign(“num”, c(1, 5, 6, 8)). We often use “<-” for
assignment, “assign” function is mostly used in developing functions
• A vector can be of any length begining from 1 to about 2.1474836 × 109
1
Data types
R recognises seven data types, these are:
• Logical
• Integer
• Real/Double
• String/Character
• Factor
cont. . .
• Complex
• Raw
• R manuals specifys six types; logical, integer, double, character, complex
and raw. However, factor is a data type that does not fall into either of
the six listed data types.
• In this sub-section we introduce these data types
Data types: Logical
• These are vectors with only TRUE and FALSE values like c(TRUE, TRUE,
FALSE, TRUE, FALSE)
• Can be considered as binary vectors in analysis
• Other than categorical variables with these values, these vectors are often
created by binary operators like “<”, “>”, “<=”, >=, ==, =!, “|”, “||”,
“&”, and “&&”
• During analysis, these vectors can be coerced to numeric values in which
case TRUE becomes 1 and FALSE becomes 0
• These vectors include value “NA” which in R means “Not Available”, a
placeholder for missing values.
• Any operation done with a vector containing NA is bound to result to NA
since NA is unknown
Data types: Integer
• These are basically positive and negative numbers without fractions {. . . ,
-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, . . . }
• In R, integers are denoted with letter L e.g. c(-3L, 0L, 2L, 5L, 6L). Can
confirm it’s an integer vector with function is.integer(c(-3L, 0L, 2L,
5L, 5L))
• Example of a variable which can be considered to naturally have integers
is “number of people” (you can’t have a fraction of a person)
• Mathematically denoted by ( mathbb{Z} )
2
Real/Double
• A real number is any number along an infinitely number line
• They include fractions
• Denoted mathematically with ( mathbb{R} )
• Any numeric vector that does not have values followed by letter “L” are
considered as double e.g. c(-3, 0, 2, 5, 6). Can confirm a vector is a real
or double vector with funtion “is.double” e.g is.double(c(-3, 0, 2, 5,
6))
String/Character
• Composed of alphabetical letters and word/text
• Denoted by single or double quotation marks
• R has a special vector with alphabetical letter; this is letters
• Example c("a", "b", "c"), letters, c('cats', 'and' , 'dogs')
• Can check whether a vector is a character vector with function
is.character e.g. is.character(letters)
Data type: Factors
n
• In R a factor vector is a categorical variable with discrete classification
(grouping)
• Example
cat <- factor(c(rep("Y", 28), rep("N", 10)))
is.factor(cat)
[1] TRUE
levels(cat)
[1] "N" "Y"
Data type: Complex
n
• These are vectors with real and imaginary values. Imaginary numbers are
denoted by letter “i”
• Mathematically used to make it possible to take square-root of negative
values
3
# Example, complex vector
3+2i
[1] 3+2i
# Confirm it's complex
is.complex(3+2i)
[1] TRUE
Data type: Raw
• These are vectors containing computer bytes or information on data storage
units
• More of computer language (0’s and 1’s) than human readable language
• Integers and doubles are jointly refered to as numeric
• The most commonly used data types are logical, numeric and characters.
Complex and raw data types are rarely used
int <- c(-3L, -2L, -1L, 0L, 1L, 2L, 3L)
is.integer(int)
[1] TRUE
is.numeric(int)
[1] TRUE
doub <- c(-3, -2, -1, 0, 1, 2, 3)
is.double(doub)
[1] TRUE
is.numeric(doub)
[1] TRUE
Data structures
• There two broad types of data structures in R
– Atomic vectors
– Generic (list) vectors
• These structures have three properties
– Type
– Length and
– Attributes
4
• Function "type" is used to establish a vector’s type, function "length"
is used to determine length and function "attributes" is used to get
additional information about a vector
• Atomic vectors and lists differ in their type as atomic vectors can only
contain one data type while lists can contain any number of data types.
Atomic Vectors
• Contains only one data type, they include 1 dimensional atomic vectors, 2
dimensional atomic vectors called “matrices” and multi-dimensional atomic
vectors called “arrays”.
• Dimensionality can be considered as number of indices required to address
any element in a vector e.g. vector “cat” requires one index to address any
value, for example index “4” means fourth value which is Y
• Single variables are all atomic vectors of one dimension
• To check if a vector is either atomic or list, use is.atomic() or is.list().
Note there is a is.vector() but this checks if vector is named
Atomic vectors: Matrices
• Two dimensional atomic vectors, they contain data of the same type
• Any atomic vector can be converted to a matrix by adding a dim attribute
cat <- c(rep("Y", 28), rep("N", 10))
typeof(cat)
[1] "character"
dim(cat)
NULL
is.matrix(cat)
[1] FALSE
dim(cat) <- c(19, 2)
typeof(cat)
[1] "character"
dim(cat)
[1] 19 2
is.matrix(cat)
5
[1] TRUE
• Other than using "dim()" to convert a one dim to a multi-dimension
atomic vector, matrices can be created with "matrix()", or by coercing
another data object with "as.matrix()"
typeof(airmiles)
[1] "double"
airmiles2 <- matrix(airmiles, nrow = 8, ncol = 3)
is.matrix(airmiles2)
[1] TRUE
airmiles3 <- as.matrix(airmiles, nrow = 8, ncol = 3)
is.matrix(airmiles3)
[1] TRUE
rm(airmiles2, airmiles3)
Special 1 & 2 dimension atomic vectors
Time series objects
• These are vectors used to store observations collected at given time points
(interval) over a period time, e.g. observations collected every three three
months for five year.
• Distiguishing feature in this data is time, interval is usually constant like
three months (regular), but in other cases it might not be so (irregular)
• In R, time series data are numeric vectors with attribute class equal “ts”
meaning time series
• Time series vectors can either be 1 dim atomic vector like “AirPassengers”
data set in R or a 2d matrix like "EuStockMarkets"
typeof(AirPassengers)
[1] "double"
attr(AirPassengers, "class")
[1] "ts"
typeof(EuStockMarkets)
[1] "double"
attr(EuStockMarkets, "class")
[1] "mts" "ts" "matrix"
6
Atomic vectors: Arrays
• Arrays are multi-dimensional atomic vectors.
• Matrices are two dimensional array.
• They are rarely used, but it’s good to know they exist
• Created like matrices; "dim()" e.g. dim(a) <- c(6, 2, 2), or array()
or as.array()
Data structures: Generic vectors
• Lists are data structure which can contain more than on type of data type.
• There are two types of lists; two dimensional lists called "data frames"
and "lists"
Data frames
n
• Most recognizable data structure
• A core data strucure in R
• Present data in row and columns like matrices, but in this case columns
can have different data types
# Example
head(faithful)
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
Generic vectors: Lists
• These are unique data structure
• Can contain any number and type of object, not just data. Can contain
sub-lists hence also called recursive
• Created with function “list()”. Can also coerce other structures to a list
with function “as.list()”
• We will create this structure in our next session
7
Importing and Exporting Data in R
• Data importation also referred to as “reading in” data
• Reading data depends on type and location of file
• Sub-session interest, reading in local R, text, excel, database and other
statistical program files
• Also discuss web scrapping
Reading in .RData
• Data created in R can be store in RData file
• This could be any data structure or a collection of data saved from an
active working directory (workspace)
• Function “save.image()” used to store workspace, function “load” is used
to read in any “.RData” (or even .Rhistory)
# See current objects
ls()
[1] "cat" "doub" "int"
# Store in an external .RData file
save.image()
# Remove all object from workspace/global environment
rm(list = ls())
ls()
character(0)
# Read in .RData
load(".RData")
# Check we have them back
ls()
[1] "cat" "doub" "int"
R’s core importing function “read.table()”
• read.table is R’s core importing function
• Almost all other functions including contributed packages depend on this
function
• Reads a file and creates a data frame from it
• It has a number of wrapper functions (functions which provide a con-
vinience interface to another function like give pre-defined/default values,
this make function calls more efficient)
8
• Wrapper functions include read.csv(), read.csv2(), read.delim,
read.delim2
• CSV are comma separated files
• Delim are text files, word delim means delimited which implys how data
are separate like with tabs
• Both csv and delim are relatively easy to read into R as long as separa-
tor/delimitors are known
• In case separator or delimitor is not known and file cannot be opened, then
best to read in a few lines with read.lines function Live demo (reading
in CSV file)
Reading in Excel files
• Base R does not have a function to read in Excel based files
• But many contributed packages have functions to read them in
• Core reference in importing this type of files is one of R-projects manuals
R Data Import/Export specifically chapter 9.
• Recommendation made is to try and convert Excel file in to “.csv” (comma-
separated) or “delim” (tab-separated) file. Live demo (reading excel file)
Reading in Databases data
• A bit of caution, database data tend to be large, R is not to good when it
comes to large data, hence read in part of data or look for ways to increase
memory allocated to R processes like using cloud.
• Most Relational Database Management Systems (RDMS) have data similar
to R’s dataframe where columns are called “fields” and rows are called
“records”.
• Extracting part of relational database requires use of database quering
sematics core of which is a SELECT statement.
• In general, SELECT query uses:
– FROM to select the table
– WHERE to specify a condition for inclusion and
– ORDER BY to sort results (this is important as RDMS do not order
it’s rows like R’s dataframes)
• There are a number of contributed packaged on CRAN for reading RDMS
data, these include RMySQL, DBI, ROracle, RPostgreSQL and RSQLite.
Live demo (reading in RDMS and web data)
9
From other statistical softwares
• Other statistical softwares often used to read in data are SPSS, SAS, Stata
and EpiInfo
• Like excel and database data, to read in these files a package must be used
• Recommended package is package "foreign" other packages include,
"readstata3" and haven.
Live demo (reading SPSS and Stata data files)
10

Más contenido relacionado

La actualidad más candente

Standard template library
Standard template libraryStandard template library
Standard template librarySukriti Singh
 
Standard Template Library
Standard Template LibraryStandard Template Library
Standard Template LibraryGauravPatil318
 
Unit 1 introduction to data structure
Unit 1   introduction to data structureUnit 1   introduction to data structure
Unit 1 introduction to data structurekalyanineve
 
Abstract Algebra and Category Theory
Abstract Algebra and Category Theory Abstract Algebra and Category Theory
Abstract Algebra and Category Theory Naveenkumar Muguda
 
Data structures & algorithms lecture 3
Data structures & algorithms lecture 3Data structures & algorithms lecture 3
Data structures & algorithms lecture 3Poojith Chowdhary
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1Aahwini Esware gowda
 
How to choose best containers in STL (C++)
How to choose best containers in STL (C++)How to choose best containers in STL (C++)
How to choose best containers in STL (C++)Sangharsh agarwal
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R LanguageGaurang Dobariya
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)Elavarasi K
 
Data structure &amp; algorithms introduction
Data structure &amp; algorithms introductionData structure &amp; algorithms introduction
Data structure &amp; algorithms introductionSugandh Wafai
 
Set data structure
Set data structure Set data structure
Set data structure Tech_MX
 
Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm KristinaBorooah
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 

La actualidad más candente (20)

STL in C++
STL in C++STL in C++
STL in C++
 
Standard template library
Standard template libraryStandard template library
Standard template library
 
Standard Template Library
Standard Template LibraryStandard Template Library
Standard Template Library
 
Lecture-05-DSA
Lecture-05-DSALecture-05-DSA
Lecture-05-DSA
 
Data structures in c#
Data structures in c#Data structures in c#
Data structures in c#
 
Basic data-structures-v.1.1
Basic data-structures-v.1.1Basic data-structures-v.1.1
Basic data-structures-v.1.1
 
Unit 1 introduction to data structure
Unit 1   introduction to data structureUnit 1   introduction to data structure
Unit 1 introduction to data structure
 
Abstract Algebra and Category Theory
Abstract Algebra and Category Theory Abstract Algebra and Category Theory
Abstract Algebra and Category Theory
 
Data structures & algorithms lecture 3
Data structures & algorithms lecture 3Data structures & algorithms lecture 3
Data structures & algorithms lecture 3
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
 
Data structure ppt
Data structure pptData structure ppt
Data structure ppt
 
Data Structures (BE)
Data Structures (BE)Data Structures (BE)
Data Structures (BE)
 
How to choose best containers in STL (C++)
How to choose best containers in STL (C++)How to choose best containers in STL (C++)
How to choose best containers in STL (C++)
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R Language
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)
 
Data structure &amp; algorithms introduction
Data structure &amp; algorithms introductionData structure &amp; algorithms introduction
Data structure &amp; algorithms introduction
 
Set data structure
Set data structure Set data structure
Set data structure
 
Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Data structures using C
Data structures using CData structures using C
Data structures using C
 

Similar a R training2

Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSHaritikaChhatwal1
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbaiUnmesh Baile
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Chia-Chi Chang
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptxkarthikks82
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
standard template library(STL) in C++
standard template library(STL) in C++standard template library(STL) in C++
standard template library(STL) in C++•sreejith •sree
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
 
Basic of array and data structure, data structure basics, array, address calc...
Basic of array and data structure, data structure basics, array, address calc...Basic of array and data structure, data structure basics, array, address calc...
Basic of array and data structure, data structure basics, array, address calc...nsitlokeshjain
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011Mandi Walls
 

Similar a R training2 (20)

R data types
R data typesR data types
R data types
 
Language R
Language RLanguage R
Language R
 
Data Types of R.pptx
Data Types of R.pptxData Types of R.pptx
Data Types of R.pptx
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
 
Session 4
Session 4Session 4
Session 4
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
 
Array
ArrayArray
Array
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
standard template library(STL) in C++
standard template library(STL) in C++standard template library(STL) in C++
standard template library(STL) in C++
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
 
Basic of array and data structure, data structure basics, array, address calc...
Basic of array and data structure, data structure basics, array, address calc...Basic of array and data structure, data structure basics, array, address calc...
Basic of array and data structure, data structure basics, array, address calc...
 
R language tutorial.pptx
R language tutorial.pptxR language tutorial.pptx
R language tutorial.pptx
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

Más de Hellen Gakuruh

Prelude to level_three
Prelude to level_threePrelude to level_three
Prelude to level_threeHellen Gakuruh
 
SessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsSessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsHellen Gakuruh
 
Introduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RIntroduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RHellen Gakuruh
 
SessionTen_CaseStudies
SessionTen_CaseStudiesSessionTen_CaseStudies
SessionTen_CaseStudiesHellen Gakuruh
 
SessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelpSessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelpHellen Gakuruh
 
SessionEight_PlottingInBaseR
SessionEight_PlottingInBaseRSessionEight_PlottingInBaseR
SessionEight_PlottingInBaseRHellen Gakuruh
 
SessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTimeSessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTimeHellen Gakuruh
 
SessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsSessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsHellen Gakuruh
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataHellen Gakuruh
 
SessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsSessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsHellen Gakuruh
 
SessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCallsSessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCallsHellen Gakuruh
 

Más de Hellen Gakuruh (20)

R training6
R training6R training6
R training6
 
R training5
R training5R training5
R training5
 
R training4
R training4R training4
R training4
 
R training
R trainingR training
R training
 
Prelude to level_three
Prelude to level_threePrelude to level_three
Prelude to level_three
 
Prelude to level_two
Prelude to level_twoPrelude to level_two
Prelude to level_two
 
SessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsSessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystems
 
Day 2
Day 2Day 2
Day 2
 
Day 1
Day 1Day 1
Day 1
 
Introduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RIntroduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_R
 
SessionTen_CaseStudies
SessionTen_CaseStudiesSessionTen_CaseStudies
SessionTen_CaseStudies
 
webScrapingFunctions
webScrapingFunctionswebScrapingFunctions
webScrapingFunctions
 
SessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelpSessionNine_HowandWheretoGetHelp
SessionNine_HowandWheretoGetHelp
 
SessionEight_PlottingInBaseR
SessionEight_PlottingInBaseRSessionEight_PlottingInBaseR
SessionEight_PlottingInBaseR
 
SessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTimeSessionSeven_WorkingWithDatesandTime
SessionSeven_WorkingWithDatesandTime
 
SessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsSessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjects
 
Files
FilesFiles
Files
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingData
 
SessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsSessionFour_DataTypesandObjects
SessionFour_DataTypesandObjects
 
SessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCallsSessionTwo_MakingFunctionCalls
SessionTwo_MakingFunctionCalls
 

Último

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Último (20)

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

R training2

  • 1. Introduction to Data Analysis and Graphics2 Introduction to Data Analysis and Graphics2 Hellen Gakuruh 2017-03-07 Session Two Vector and Assignment, Data Objects and Data Importation Outline By the end of this session we will have knowledge on: • Vectors and Assignment • Data types • Data structure and • Importing data into R Vector and Assignment • Simplest data structure in R is a vector. From a data point of view, a vector is collection of elements. These elements can be numeric values, alphabetical characters, logical, dates and time values. • Vectors are created with function “c” which means “concatenate”. e.g. a numerical vector c(1, 5, 6, 8) • Thee vectors can be named by using an assignment operator “<-” or function “assign()”. e.g. to assign vector c(1, 5, 6, 8) to name “num”; num <- c(1, 5, 6, 8) or assign(“num”, c(1, 5, 6, 8)). We often use “<-” for assignment, “assign” function is mostly used in developing functions • A vector can be of any length begining from 1 to about 2.1474836 × 109 1
  • 2. Data types R recognises seven data types, these are: • Logical • Integer • Real/Double • String/Character • Factor cont. . . • Complex • Raw • R manuals specifys six types; logical, integer, double, character, complex and raw. However, factor is a data type that does not fall into either of the six listed data types. • In this sub-section we introduce these data types Data types: Logical • These are vectors with only TRUE and FALSE values like c(TRUE, TRUE, FALSE, TRUE, FALSE) • Can be considered as binary vectors in analysis • Other than categorical variables with these values, these vectors are often created by binary operators like “<”, “>”, “<=”, >=, ==, =!, “|”, “||”, “&”, and “&&” • During analysis, these vectors can be coerced to numeric values in which case TRUE becomes 1 and FALSE becomes 0 • These vectors include value “NA” which in R means “Not Available”, a placeholder for missing values. • Any operation done with a vector containing NA is bound to result to NA since NA is unknown Data types: Integer • These are basically positive and negative numbers without fractions {. . . , -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, . . . } • In R, integers are denoted with letter L e.g. c(-3L, 0L, 2L, 5L, 6L). Can confirm it’s an integer vector with function is.integer(c(-3L, 0L, 2L, 5L, 5L)) • Example of a variable which can be considered to naturally have integers is “number of people” (you can’t have a fraction of a person) • Mathematically denoted by ( mathbb{Z} ) 2
  • 3. Real/Double • A real number is any number along an infinitely number line • They include fractions • Denoted mathematically with ( mathbb{R} ) • Any numeric vector that does not have values followed by letter “L” are considered as double e.g. c(-3, 0, 2, 5, 6). Can confirm a vector is a real or double vector with funtion “is.double” e.g is.double(c(-3, 0, 2, 5, 6)) String/Character • Composed of alphabetical letters and word/text • Denoted by single or double quotation marks • R has a special vector with alphabetical letter; this is letters • Example c("a", "b", "c"), letters, c('cats', 'and' , 'dogs') • Can check whether a vector is a character vector with function is.character e.g. is.character(letters) Data type: Factors n • In R a factor vector is a categorical variable with discrete classification (grouping) • Example cat <- factor(c(rep("Y", 28), rep("N", 10))) is.factor(cat) [1] TRUE levels(cat) [1] "N" "Y" Data type: Complex n • These are vectors with real and imaginary values. Imaginary numbers are denoted by letter “i” • Mathematically used to make it possible to take square-root of negative values 3
  • 4. # Example, complex vector 3+2i [1] 3+2i # Confirm it's complex is.complex(3+2i) [1] TRUE Data type: Raw • These are vectors containing computer bytes or information on data storage units • More of computer language (0’s and 1’s) than human readable language • Integers and doubles are jointly refered to as numeric • The most commonly used data types are logical, numeric and characters. Complex and raw data types are rarely used int <- c(-3L, -2L, -1L, 0L, 1L, 2L, 3L) is.integer(int) [1] TRUE is.numeric(int) [1] TRUE doub <- c(-3, -2, -1, 0, 1, 2, 3) is.double(doub) [1] TRUE is.numeric(doub) [1] TRUE Data structures • There two broad types of data structures in R – Atomic vectors – Generic (list) vectors • These structures have three properties – Type – Length and – Attributes 4
  • 5. • Function "type" is used to establish a vector’s type, function "length" is used to determine length and function "attributes" is used to get additional information about a vector • Atomic vectors and lists differ in their type as atomic vectors can only contain one data type while lists can contain any number of data types. Atomic Vectors • Contains only one data type, they include 1 dimensional atomic vectors, 2 dimensional atomic vectors called “matrices” and multi-dimensional atomic vectors called “arrays”. • Dimensionality can be considered as number of indices required to address any element in a vector e.g. vector “cat” requires one index to address any value, for example index “4” means fourth value which is Y • Single variables are all atomic vectors of one dimension • To check if a vector is either atomic or list, use is.atomic() or is.list(). Note there is a is.vector() but this checks if vector is named Atomic vectors: Matrices • Two dimensional atomic vectors, they contain data of the same type • Any atomic vector can be converted to a matrix by adding a dim attribute cat <- c(rep("Y", 28), rep("N", 10)) typeof(cat) [1] "character" dim(cat) NULL is.matrix(cat) [1] FALSE dim(cat) <- c(19, 2) typeof(cat) [1] "character" dim(cat) [1] 19 2 is.matrix(cat) 5
  • 6. [1] TRUE • Other than using "dim()" to convert a one dim to a multi-dimension atomic vector, matrices can be created with "matrix()", or by coercing another data object with "as.matrix()" typeof(airmiles) [1] "double" airmiles2 <- matrix(airmiles, nrow = 8, ncol = 3) is.matrix(airmiles2) [1] TRUE airmiles3 <- as.matrix(airmiles, nrow = 8, ncol = 3) is.matrix(airmiles3) [1] TRUE rm(airmiles2, airmiles3) Special 1 & 2 dimension atomic vectors Time series objects • These are vectors used to store observations collected at given time points (interval) over a period time, e.g. observations collected every three three months for five year. • Distiguishing feature in this data is time, interval is usually constant like three months (regular), but in other cases it might not be so (irregular) • In R, time series data are numeric vectors with attribute class equal “ts” meaning time series • Time series vectors can either be 1 dim atomic vector like “AirPassengers” data set in R or a 2d matrix like "EuStockMarkets" typeof(AirPassengers) [1] "double" attr(AirPassengers, "class") [1] "ts" typeof(EuStockMarkets) [1] "double" attr(EuStockMarkets, "class") [1] "mts" "ts" "matrix" 6
  • 7. Atomic vectors: Arrays • Arrays are multi-dimensional atomic vectors. • Matrices are two dimensional array. • They are rarely used, but it’s good to know they exist • Created like matrices; "dim()" e.g. dim(a) <- c(6, 2, 2), or array() or as.array() Data structures: Generic vectors • Lists are data structure which can contain more than on type of data type. • There are two types of lists; two dimensional lists called "data frames" and "lists" Data frames n • Most recognizable data structure • A core data strucure in R • Present data in row and columns like matrices, but in this case columns can have different data types # Example head(faithful) eruptions waiting 1 3.600 79 2 1.800 54 3 3.333 74 4 2.283 62 5 4.533 85 6 2.883 55 Generic vectors: Lists • These are unique data structure • Can contain any number and type of object, not just data. Can contain sub-lists hence also called recursive • Created with function “list()”. Can also coerce other structures to a list with function “as.list()” • We will create this structure in our next session 7
  • 8. Importing and Exporting Data in R • Data importation also referred to as “reading in” data • Reading data depends on type and location of file • Sub-session interest, reading in local R, text, excel, database and other statistical program files • Also discuss web scrapping Reading in .RData • Data created in R can be store in RData file • This could be any data structure or a collection of data saved from an active working directory (workspace) • Function “save.image()” used to store workspace, function “load” is used to read in any “.RData” (or even .Rhistory) # See current objects ls() [1] "cat" "doub" "int" # Store in an external .RData file save.image() # Remove all object from workspace/global environment rm(list = ls()) ls() character(0) # Read in .RData load(".RData") # Check we have them back ls() [1] "cat" "doub" "int" R’s core importing function “read.table()” • read.table is R’s core importing function • Almost all other functions including contributed packages depend on this function • Reads a file and creates a data frame from it • It has a number of wrapper functions (functions which provide a con- vinience interface to another function like give pre-defined/default values, this make function calls more efficient) 8
  • 9. • Wrapper functions include read.csv(), read.csv2(), read.delim, read.delim2 • CSV are comma separated files • Delim are text files, word delim means delimited which implys how data are separate like with tabs • Both csv and delim are relatively easy to read into R as long as separa- tor/delimitors are known • In case separator or delimitor is not known and file cannot be opened, then best to read in a few lines with read.lines function Live demo (reading in CSV file) Reading in Excel files • Base R does not have a function to read in Excel based files • But many contributed packages have functions to read them in • Core reference in importing this type of files is one of R-projects manuals R Data Import/Export specifically chapter 9. • Recommendation made is to try and convert Excel file in to “.csv” (comma- separated) or “delim” (tab-separated) file. Live demo (reading excel file) Reading in Databases data • A bit of caution, database data tend to be large, R is not to good when it comes to large data, hence read in part of data or look for ways to increase memory allocated to R processes like using cloud. • Most Relational Database Management Systems (RDMS) have data similar to R’s dataframe where columns are called “fields” and rows are called “records”. • Extracting part of relational database requires use of database quering sematics core of which is a SELECT statement. • In general, SELECT query uses: – FROM to select the table – WHERE to specify a condition for inclusion and – ORDER BY to sort results (this is important as RDMS do not order it’s rows like R’s dataframes) • There are a number of contributed packaged on CRAN for reading RDMS data, these include RMySQL, DBI, ROracle, RPostgreSQL and RSQLite. Live demo (reading in RDMS and web data) 9
  • 10. From other statistical softwares • Other statistical softwares often used to read in data are SPSS, SAS, Stata and EpiInfo • Like excel and database data, to read in these files a package must be used • Recommended package is package "foreign" other packages include, "readstata3" and haven. Live demo (reading SPSS and Stata data files) 10