SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Data Analysis and
Programming in R
Eswar Sai Santosh Bandaru
Eswar Sai Santosh Bandaru
R
• What is R?
• Programming language meant for statistical analysis, data mining
• https://en.wikipedia.org/wiki/R_(programming_language)
• Why R?
• Effective data manipulation, Storage and graphical display
• Free of cost, open source
• Many packages contributed by experienced programmers/ statisticians
• https://cran.r-project.org/web/packages/available_packages_by_name.html
• Simple and elegant code, easy to learn
• Microsoft is integrating R in SQL server
• Problems:
• Memory management : data sits on RAM
• Speed
• Many developments are happening to address these problems.
Eswar Sai Santosh Bandaru
Eswar Sai Santosh Bandaru
R studio Interface: Console
Console:
Run your code
here
Eswar Sai Santosh Bandaru
R studio Interface: Editor Save and
edit your
code here
Eswar Sai Santosh Bandaru
R studio Interface: Output
Output – plots
and help
Eswar Sai Santosh Bandaru
General Things:
• Case sensitive
• Shortcuts:
• CTRL+ENTER (Important): Send code from editor to console and execute
• CTRL+2: Move the console from editor to console
• CTRL+1: MOVE the cursor from console to editor
• CTRL+UP IN CONSOLE: Retrieve previous commands
• # hash is used for commenting the code
• CTRL+SHIFT+C: comment/uncomment a block of code
Eswar Sai Santosh Bandaru
R as a calculator
• + : Addition -- 2+3 output:5
• - : Subtraction -- 4-5 output: -1
• * : Multiplication - 2*3 output:8
• ^ or ** : Exponentiation -- 2^3 or 2**3
• / : Division - 17/3 -- 5.66667
• %% : Modulo Division - 17%3-- 2
• %/% : Integer Division -17%/%3 -- 5
Eswar Sai Santosh Bandaru
Assignments and Expression
• “<-” is the assignment operator in R
• a<-3, 3 gets assigned to variable a
• Expressions
• Combination of numbers/variables/operators
• E.g., 2+3*a/14
• Order of Evaluation:
• ORDER OF EVALUATION: BRACKETS -> EXPONENTIATION-> DIVISION ->
MULTILICATION -> ADDITION/SUBTRACTION
• E.g., 7*9/13 - 10.1111
• -2^0.5 -- -1.414
• (-2) ^0.5 - NaN
• Q1
Eswar Sai Santosh Bandaru
Data Types
• Numeric: Real Numbers. E.g., 1.24, -3.12, 1
• Integer: Integer values. Suffix L is added
• Character: E.g., ‘a’ , “a”, “Hello World!”, “2”
• Logical: Boolean Type. TRUE (1), FALSE(0), T, F
• Complex: a+bi . a,b are real numbers
• Class(): function is used to check the class
• E.g., class(24) -- numeric
• E.g., class(24L)-- integer
Eswar Sai Santosh Bandaru
Data structures
• 4 main types:
• Vector
• Matrices
• Lists
• Data frames
• We would discuss vectors and data frames in today’s session
Eswar Sai Santosh Bandaru
Vectors:
• One dimension collection of objects of same kind (same data type)
• Vectors in R are similar to arrays in any other programming language
• Syntax: (1,2,3,4,5) . 1,2,3,4,5 are called elements
• (1,2,3,4,5) : numeric vector
• (‘a’,’b’,’c’,’d’): character vector
• (T, F, T, T): logical vector
• (1L,2L,3L): integer vector
• (1,2,3,4,6) ----- valid vector
• (1,’a’,3,’t’) ------ invalid vector (but R doesn’t throw an error due to
coercion
Eswar Sai Santosh Bandaru
Creating
• Basic ways:
• Using c()
• Using “:”
• Using seq()
• Using rep()
• Using vector()
Eswar Sai Santosh Bandaru
C() combine function
• Syntax:
• X<- C(1,2,4,78,90) creates a Numeric vector X with elements 1,2,4,78,90
• Y<- c(‘a’,’b’,’c’,’d’) creates a character vector Y with elements ‘a’, ‘b’, ‘c’,’d’
• Printing:
• X # Auto printing
• Print(x) # explicit printing
Eswar Sai Santosh Bandaru
Using “:”
• x <- 20:50
• Creates a numeric vector x with values starting from 20 till 50 with increments
of 1
• Ending value > Starting Value - default increment +1
• y <- 50:20
• Creates a numeric vector x with values starting from 50 till 20 with increments
of -1
• Ending value < Starting Value .- default increment -1
Eswar Sai Santosh Bandaru
Seq()
• X <- seq(2,50)
• Creates a numeric vector starting from 2 till 50 with increment of +1
• X <- seq(50,2)
• Creates a numeric vector starting from 50 till 2 with increment of -1
• X <- seq(2,50,2)
• Creates a numeric vector starting from 2 till 50 with increment of +2
• Increment can also be –ve if starting element > ending element
• ( 2, 4,6,8,10…….,50)
• X<- seq(‘a’,’b’,2) Throws an error
Eswar Sai Santosh Bandaru
Rep()
• X <- rep(c(1,2,3),times =2)
• Creates vector numeric vector X: 1,2,3,1,2,3
• The vector gets repeated twice
• rep(1:3, each =2)
• Output: 1,1,2,2,3,3
• Each element in the vector gets repeated twice
• rep(1:3,each=2,times =3)
• Output: 1,1,2,2,3,3, 1,1,2,2,3,3, 1,1,2,2,3,3,
• 2 steps
• 1:Each element gets repeated twice
• 2: the entire vector itself gets repeated thrice
• Different variations of rep-- ?rep
Eswar Sai Santosh Bandaru
Combining vectors
• X <-c(1,2,3,4,5)
• Y<-c(1,6,7,8)
• Z<-c(X,Y)
• Combines vectors X,Y and assigns to Z, output: 1,2,3,4,5,1,6,7,8
• Q1 – Q8
Eswar Sai Santosh Bandaru
vector()
• X<-vector() …empty vector with default data type:logical
• X<-vector (…)
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[1]: ‘a’
• Unlike python, java…indexing starts from 1 in R
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[5]: ‘e’
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[-1]: ‘b’ ‘c’ ‘d’ ‘e’ ‘f’
Expect first
element
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[1:3]: ‘a’ ‘b’ ‘c’
Not same as x[3:1]
Prints first
three
elements
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[-1:-2]: ‘c’ ‘d’ ‘e’ ‘f’
or
X[-2:-1]: ‘c’ ‘d’ ‘e’ ‘f’
Eswar Sai Santosh Bandaru
Example
• X[1:(length(X)-1)]
• Prints every element except for the last element
Eswar Sai Santosh Bandaru
Element wise operations
• (45,20, 25,3,4)
+
• (2, 6, 10, 1, 3)
||
(47, 26, 35, 4, 7)
• (45,20, 25,3,4)
+
• (2, 6, 10, 1, 3)
||
(47, 26, 35, 4, 7)
• (45,20, 25,3,4)
+
• (2, 6, 10, 1, 3)
||
(47, 26, 35, 4, 7)
Eswar Sai Santosh Bandaru
Example:
• x1 <- c(1,2,3), x2 <- c(6,7,8). what is x1+2*x2
• (1,2,3)
• 2*(6,7,8) -- (12, 14, 16) ….recycling!
• (1,2,3) + (12,14,16) - (13,16,19)
Eswar Sai Santosh Bandaru
Recycling
• 1:5 + 1
• Internally 1,2,3,4,5 + 1,1,1,1,1 (1 gets recycled 5 times to match the length of
longer vector, then element wise operation occurs)
• 1:6 + c(1,2)
• Internally 1,2,3,4,5,6 + 1,2,1,2,1,2 (c(1,2) gets recycled to meet the length of
longer vector)
• C(1,2,3,4,5,6,7) + c(1,2,3,4) ( a warning !!)
• 1,2,3,4,5,6,7 + 1,2,3,4,1,2,3
Eswar Sai Santosh Bandaru
Q12: Create vector q using element wise
operations
Eswar Sai Santosh Bandaru
Subsetting a vector with logical vector
• Y <- c('a','b','c','d')
• Y[c(T,T,F,T)]
• ‘a’ ‘b’ ‘d’(selects the element if true else does not select)
• Recycling
• Y[c(T)]
• Vector T gets recycled till it matches the length of Y
• Every element gets printed
Eswar Sai Santosh Bandaru
Comparison operators
• X<- c(1,2,3,4,5,6,7)
• X>4 (x greater than 4)
• Outputs a logical vector having True for values greater than 4 and false for
values less than or equal to false
• Output: logical vector : F,F,F,F,T,T,T
• X[X>4]
• Selects elements from X which are greater than 4
• Output: 5,6,7
Eswar Sai Santosh Bandaru
Conditional operators in R
• conditional statements in R
• x == y : checks for equality, outputs TRUE if equal else FALSE
• x !=y : checks for inequality
• x >=y: greater than or equal
• x <=y
• x<y
• x>y
• You can combine both of them using & , or operators
• Q13-Q16
Eswar Sai Santosh Bandaru
Coercion
• x <- c(1,2,'a',3) -- Does not throw an error
• Other elements in the vector gets coerced to character
• Output: ‘1’,’2’,’a’,’3’
• priority for coercion; character> numeric> logical
• Logical converts to 1,0
• explicit coercion:
• as.* function s
• as.character (1:20) # customerID
• X<-c(‘a’,’b’,’c’,’d’)
• as.numeric(x)--- R produced NA’s
• Output: NA, NA, NA, NA
Eswar Sai Santosh Bandaru
Some important functions
• Which() : produces the indices of vector the condition is satisfied
• X <- c(10,2,4,5,0)
• Which(x>2)
• Output: 1, 3, 4
• all() : produces a logical vector if a condition is satisfied by all values in
a vector
• all(x>2): False
• any(): produces a logical vector if a condition is satisfied in any values
in a vector
• Any(x>2) :TRUE
Eswar Sai Santosh Bandaru
attributes
• Attributes: Give additional information about elements of a vector
• E.g., names of elements, dimensions, levels
• attributes(x) : shows all the available attributes of x
• If there are no attributes, r outputs NULL
• We can assign attributes to a created vector
• E.g., we can assign names to elements with function name()
• names(x) <- student_names
• Where student names is character vector containing names of students
Eswar Sai Santosh Bandaru
Subsetting using names attribute
• X[‘Cory’] -- prints marks of Cory
• Internally…using which() , R gets the index whose attribute name is “Cory”
• Then subsets based on the index
• X[c(‘Cory’,’James’)] - prints marks of Cory and James
• Q16
Eswar Sai Santosh Bandaru
Updating a vector: What if Cory’s marks get
updated
• X[1] <- 35
• Element at index 1 gets updated to 35
• X[x<30 &&x>25] <-40
• All the values which are less than 30 updated to 40
• X[“Cory”] <- 67
Eswar Sai Santosh Bandaru
is.na() and mean imputation
• x<- c(1,2,4,NA,5,NA)
• is.na(x): produces a logical vector, TRUE if element is NA else FALSE
• Output: F F F T F T
• Replace NA with the mean values????
Eswar Sai Santosh Bandaru
Factors attribute
• Converts a continuous vector in to a categorical data
• X<-c(1,1,1,2,2,2,3,3,3)
• Sum(x) : 18
• X<-factors(X)
• Sum(x) : error
• Levels(x): categories in x
• Output: “1” “2” “3”
• Class(X)
• Output: factor
Eswar Sai Santosh Bandaru
Table function: frequency table
• Counts the number of times an element occurs in vector
• X<-c(‘a’,’a’,’a’,’b’,’b’,’c’,’c’)
• table(x):
• a-3
• b-2
• c-2
• Useful while plotting barplot
Eswar Sai Santosh Bandaru
ls() and rm()
• ls() : Lists all the objects in the current R session(environment)
• rm(“d”) : removes the object d
• rm( list = ls()): removes all objects from the environment
Eswar Sai Santosh Bandaru
Data frames:
• Data frames are simply “tables” (rows and columns)
• Each column should be of same data type (hence all the vector
operations are valid for each column)
• Creation
• X<- data.frame(data for column1, data for column 2,…….)
• Column gets binded
• 2 dimensional
Eswar Sai Santosh Bandaru
Subsetting data frames…why?
• Very useful for analyzing the data
• As it 2 dimensional, it has 2 indices : row * columns
• test[3,2] : refers to element in 3rd row 2nd column
• test[1:3,1:2]: first three rows, 2 columns
• Using column names
• test$student_name : refers to column: student_name
• Its kind of vector!...so we can perform all vector operations
• test["student_name"] : refers to column student_name
• test["marks"]
Eswar Sai Santosh Bandaru
Students with higher than average marks??
• above_average<- (test$marks>mean(test$marks))
• test$student_names[above_average]
• Two steps:
• above_average is a logical vector
• Test$student_names[above_average] selecting students where the vector is
True
Eswar Sai Santosh Bandaru
Writing into csv
• Write.csv(test,”test.csv”)
• Gets saved to the default directory(folder) R is pointing to
• To know the default directory:
• Use getwd()
Eswar Sai Santosh Bandaru
Reading a csv file
• setwd(“directory path”)
• read.csv(“file name”)
• Different function to read different files
• dir() : lists all files in the current directory
Eswar Sai Santosh Bandaru
Data inspection
• str()
• head()
• tail()
Eswar Sai Santosh Bandaru
Dates and Times in R
• Dates are stored internally as the number of days since 1970-01-01
while times are stored internally as the number of seconds since
1970-01-01
Eswar Sai Santosh Bandaru
Data Visualization in R: Using R base graphics
• 3 types:
• base graphics
• ggplot2
• lattice
• Boxplots
• Barplots
• Histograms
• Scatter plots
Eswar Sai Santosh Bandaru

Más contenido relacionado

La actualidad más candente

Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...
Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...
Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...Rohan Byanjankar
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear RegressionAndrew Ferlitsch
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries InformationNishant Munjal
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
SQL: Structured Query Language
SQL: Structured Query LanguageSQL: Structured Query Language
SQL: Structured Query LanguageRohit Bisht
 
Nested Queries Lecture
Nested Queries LectureNested Queries Lecture
Nested Queries LectureFelipe Costa
 
Sample Space and Event,Probability,The Axioms of Probability,Bayes Theorem
Sample Space and Event,Probability,The Axioms of Probability,Bayes TheoremSample Space and Event,Probability,The Axioms of Probability,Bayes Theorem
Sample Space and Event,Probability,The Axioms of Probability,Bayes TheoremBharath kumar Karanam
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysisMahak Vijayvargiya
 

La actualidad más candente (20)

Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
The normal distribution
The normal distributionThe normal distribution
The normal distribution
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...
Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...
Concept of Relational Database and Integrity Constraints [DIFFERENCE BETWEEN ...
 
Structures,pointers and strings in c Programming
Structures,pointers and strings in c ProgrammingStructures,pointers and strings in c Programming
Structures,pointers and strings in c Programming
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
SQL Queries Information
SQL Queries InformationSQL Queries Information
SQL Queries Information
 
Decision tree
Decision treeDecision tree
Decision tree
 
Feature selection
Feature selectionFeature selection
Feature selection
 
SQL: Structured Query Language
SQL: Structured Query LanguageSQL: Structured Query Language
SQL: Structured Query Language
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Nested Queries Lecture
Nested Queries LectureNested Queries Lecture
Nested Queries Lecture
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Sample Space and Event,Probability,The Axioms of Probability,Bayes Theorem
Sample Space and Event,Probability,The Axioms of Probability,Bayes TheoremSample Space and Event,Probability,The Axioms of Probability,Bayes Theorem
Sample Space and Event,Probability,The Axioms of Probability,Bayes Theorem
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Sql server T-sql basics ppt-3
Sql server T-sql basics  ppt-3Sql server T-sql basics  ppt-3
Sql server T-sql basics ppt-3
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysis
 

Destacado

R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Ten Trends in Digital Analytics Today
Ten Trends in Digital Analytics TodayTen Trends in Digital Analytics Today
Ten Trends in Digital Analytics TodayKen Burbary
 
CGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCognizant
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RRsquared Academy
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial ProgrammingSakthi Dasans
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Edureka!
 
R Programming: Introduction to Matrices
R Programming: Introduction to MatricesR Programming: Introduction to Matrices
R Programming: Introduction to MatricesRsquared Academy
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Why digital analytics?
Why digital analytics?Why digital analytics?
Why digital analytics?Raymond Chau
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics OverviewSAP Analytics
 
Combining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User ResearchCombining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User ResearchUser Intelligence
 

Destacado (20)

R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Experience Certificate
Experience CertificateExperience Certificate
Experience Certificate
 
Ten Trends in Digital Analytics Today
Ten Trends in Digital Analytics TodayTen Trends in Digital Analytics Today
Ten Trends in Digital Analytics Today
 
CGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & Insights
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Just in time
Just in timeJust in time
Just in time
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!
 
R Programming: Introduction to Matrices
R Programming: Introduction to MatricesR Programming: Introduction to Matrices
R Programming: Introduction to Matrices
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Why digital analytics?
Why digital analytics?Why digital analytics?
Why digital analytics?
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics Overview
 
Tugas komdat 1
Tugas komdat 1Tugas komdat 1
Tugas komdat 1
 
R programming
R programmingR programming
R programming
 
Combining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User ResearchCombining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User Research
 

Similar a Data Analysis and Programming in R

An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7decoupled
 
Datastructures in python
Datastructures in pythonDatastructures in python
Datastructures in pythonhydpy
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
Introduction to R
Introduction to RIntroduction to R
Introduction to Rvpletap
 
Extensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScriptExtensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScriptBrendan Eich
 
Chapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).pptChapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).ppthenokmetaferia1
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptxAdrien Melquiond
 
Programming Haskell Chapter8
Programming Haskell Chapter8Programming Haskell Chapter8
Programming Haskell Chapter8Kousuke Ruichi
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 

Similar a Data Analysis and Programming in R (20)

Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
R language introduction
R language introductionR language introduction
R language introduction
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
Datastructures in python
Datastructures in pythonDatastructures in python
Datastructures in python
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Factors.pptx
Factors.pptxFactors.pptx
Factors.pptx
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
Extensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScriptExtensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScript
 
Chapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).pptChapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).ppt
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
 
R Basics
R BasicsR Basics
R Basics
 
Programming Haskell Chapter8
Programming Haskell Chapter8Programming Haskell Chapter8
Programming Haskell Chapter8
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Arrays
ArraysArrays
Arrays
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
R workshop
R workshopR workshop
R workshop
 

Data Analysis and Programming in R

  • 1. Data Analysis and Programming in R Eswar Sai Santosh Bandaru Eswar Sai Santosh Bandaru
  • 2. R • What is R? • Programming language meant for statistical analysis, data mining • https://en.wikipedia.org/wiki/R_(programming_language) • Why R? • Effective data manipulation, Storage and graphical display • Free of cost, open source • Many packages contributed by experienced programmers/ statisticians • https://cran.r-project.org/web/packages/available_packages_by_name.html • Simple and elegant code, easy to learn • Microsoft is integrating R in SQL server • Problems: • Memory management : data sits on RAM • Speed • Many developments are happening to address these problems. Eswar Sai Santosh Bandaru
  • 4. R studio Interface: Console Console: Run your code here Eswar Sai Santosh Bandaru
  • 5. R studio Interface: Editor Save and edit your code here Eswar Sai Santosh Bandaru
  • 6. R studio Interface: Output Output – plots and help Eswar Sai Santosh Bandaru
  • 7. General Things: • Case sensitive • Shortcuts: • CTRL+ENTER (Important): Send code from editor to console and execute • CTRL+2: Move the console from editor to console • CTRL+1: MOVE the cursor from console to editor • CTRL+UP IN CONSOLE: Retrieve previous commands • # hash is used for commenting the code • CTRL+SHIFT+C: comment/uncomment a block of code Eswar Sai Santosh Bandaru
  • 8. R as a calculator • + : Addition -- 2+3 output:5 • - : Subtraction -- 4-5 output: -1 • * : Multiplication - 2*3 output:8 • ^ or ** : Exponentiation -- 2^3 or 2**3 • / : Division - 17/3 -- 5.66667 • %% : Modulo Division - 17%3-- 2 • %/% : Integer Division -17%/%3 -- 5 Eswar Sai Santosh Bandaru
  • 9. Assignments and Expression • “<-” is the assignment operator in R • a<-3, 3 gets assigned to variable a • Expressions • Combination of numbers/variables/operators • E.g., 2+3*a/14 • Order of Evaluation: • ORDER OF EVALUATION: BRACKETS -> EXPONENTIATION-> DIVISION -> MULTILICATION -> ADDITION/SUBTRACTION • E.g., 7*9/13 - 10.1111 • -2^0.5 -- -1.414 • (-2) ^0.5 - NaN • Q1 Eswar Sai Santosh Bandaru
  • 10. Data Types • Numeric: Real Numbers. E.g., 1.24, -3.12, 1 • Integer: Integer values. Suffix L is added • Character: E.g., ‘a’ , “a”, “Hello World!”, “2” • Logical: Boolean Type. TRUE (1), FALSE(0), T, F • Complex: a+bi . a,b are real numbers • Class(): function is used to check the class • E.g., class(24) -- numeric • E.g., class(24L)-- integer Eswar Sai Santosh Bandaru
  • 11. Data structures • 4 main types: • Vector • Matrices • Lists • Data frames • We would discuss vectors and data frames in today’s session Eswar Sai Santosh Bandaru
  • 12. Vectors: • One dimension collection of objects of same kind (same data type) • Vectors in R are similar to arrays in any other programming language • Syntax: (1,2,3,4,5) . 1,2,3,4,5 are called elements • (1,2,3,4,5) : numeric vector • (‘a’,’b’,’c’,’d’): character vector • (T, F, T, T): logical vector • (1L,2L,3L): integer vector • (1,2,3,4,6) ----- valid vector • (1,’a’,3,’t’) ------ invalid vector (but R doesn’t throw an error due to coercion Eswar Sai Santosh Bandaru
  • 13. Creating • Basic ways: • Using c() • Using “:” • Using seq() • Using rep() • Using vector() Eswar Sai Santosh Bandaru
  • 14. C() combine function • Syntax: • X<- C(1,2,4,78,90) creates a Numeric vector X with elements 1,2,4,78,90 • Y<- c(‘a’,’b’,’c’,’d’) creates a character vector Y with elements ‘a’, ‘b’, ‘c’,’d’ • Printing: • X # Auto printing • Print(x) # explicit printing Eswar Sai Santosh Bandaru
  • 15. Using “:” • x <- 20:50 • Creates a numeric vector x with values starting from 20 till 50 with increments of 1 • Ending value > Starting Value - default increment +1 • y <- 50:20 • Creates a numeric vector x with values starting from 50 till 20 with increments of -1 • Ending value < Starting Value .- default increment -1 Eswar Sai Santosh Bandaru
  • 16. Seq() • X <- seq(2,50) • Creates a numeric vector starting from 2 till 50 with increment of +1 • X <- seq(50,2) • Creates a numeric vector starting from 50 till 2 with increment of -1 • X <- seq(2,50,2) • Creates a numeric vector starting from 2 till 50 with increment of +2 • Increment can also be –ve if starting element > ending element • ( 2, 4,6,8,10…….,50) • X<- seq(‘a’,’b’,2) Throws an error Eswar Sai Santosh Bandaru
  • 17. Rep() • X <- rep(c(1,2,3),times =2) • Creates vector numeric vector X: 1,2,3,1,2,3 • The vector gets repeated twice • rep(1:3, each =2) • Output: 1,1,2,2,3,3 • Each element in the vector gets repeated twice • rep(1:3,each=2,times =3) • Output: 1,1,2,2,3,3, 1,1,2,2,3,3, 1,1,2,2,3,3, • 2 steps • 1:Each element gets repeated twice • 2: the entire vector itself gets repeated thrice • Different variations of rep-- ?rep Eswar Sai Santosh Bandaru
  • 18. Combining vectors • X <-c(1,2,3,4,5) • Y<-c(1,6,7,8) • Z<-c(X,Y) • Combines vectors X,Y and assigns to Z, output: 1,2,3,4,5,1,6,7,8 • Q1 – Q8 Eswar Sai Santosh Bandaru
  • 19. vector() • X<-vector() …empty vector with default data type:logical • X<-vector (…) Eswar Sai Santosh Bandaru
  • 20. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[1]: ‘a’ • Unlike python, java…indexing starts from 1 in R Eswar Sai Santosh Bandaru
  • 21. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[5]: ‘e’ Eswar Sai Santosh Bandaru
  • 22. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[-1]: ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ Expect first element Eswar Sai Santosh Bandaru
  • 23. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[1:3]: ‘a’ ‘b’ ‘c’ Not same as x[3:1] Prints first three elements Eswar Sai Santosh Bandaru
  • 24. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[-1:-2]: ‘c’ ‘d’ ‘e’ ‘f’ or X[-2:-1]: ‘c’ ‘d’ ‘e’ ‘f’ Eswar Sai Santosh Bandaru
  • 25. Example • X[1:(length(X)-1)] • Prints every element except for the last element Eswar Sai Santosh Bandaru
  • 26. Element wise operations • (45,20, 25,3,4) + • (2, 6, 10, 1, 3) || (47, 26, 35, 4, 7) • (45,20, 25,3,4) + • (2, 6, 10, 1, 3) || (47, 26, 35, 4, 7) • (45,20, 25,3,4) + • (2, 6, 10, 1, 3) || (47, 26, 35, 4, 7) Eswar Sai Santosh Bandaru
  • 27. Example: • x1 <- c(1,2,3), x2 <- c(6,7,8). what is x1+2*x2 • (1,2,3) • 2*(6,7,8) -- (12, 14, 16) ….recycling! • (1,2,3) + (12,14,16) - (13,16,19) Eswar Sai Santosh Bandaru
  • 28. Recycling • 1:5 + 1 • Internally 1,2,3,4,5 + 1,1,1,1,1 (1 gets recycled 5 times to match the length of longer vector, then element wise operation occurs) • 1:6 + c(1,2) • Internally 1,2,3,4,5,6 + 1,2,1,2,1,2 (c(1,2) gets recycled to meet the length of longer vector) • C(1,2,3,4,5,6,7) + c(1,2,3,4) ( a warning !!) • 1,2,3,4,5,6,7 + 1,2,3,4,1,2,3 Eswar Sai Santosh Bandaru
  • 29. Q12: Create vector q using element wise operations Eswar Sai Santosh Bandaru
  • 30. Subsetting a vector with logical vector • Y <- c('a','b','c','d') • Y[c(T,T,F,T)] • ‘a’ ‘b’ ‘d’(selects the element if true else does not select) • Recycling • Y[c(T)] • Vector T gets recycled till it matches the length of Y • Every element gets printed Eswar Sai Santosh Bandaru
  • 31. Comparison operators • X<- c(1,2,3,4,5,6,7) • X>4 (x greater than 4) • Outputs a logical vector having True for values greater than 4 and false for values less than or equal to false • Output: logical vector : F,F,F,F,T,T,T • X[X>4] • Selects elements from X which are greater than 4 • Output: 5,6,7 Eswar Sai Santosh Bandaru
  • 32. Conditional operators in R • conditional statements in R • x == y : checks for equality, outputs TRUE if equal else FALSE • x !=y : checks for inequality • x >=y: greater than or equal • x <=y • x<y • x>y • You can combine both of them using & , or operators • Q13-Q16 Eswar Sai Santosh Bandaru
  • 33. Coercion • x <- c(1,2,'a',3) -- Does not throw an error • Other elements in the vector gets coerced to character • Output: ‘1’,’2’,’a’,’3’ • priority for coercion; character> numeric> logical • Logical converts to 1,0 • explicit coercion: • as.* function s • as.character (1:20) # customerID • X<-c(‘a’,’b’,’c’,’d’) • as.numeric(x)--- R produced NA’s • Output: NA, NA, NA, NA Eswar Sai Santosh Bandaru
  • 34. Some important functions • Which() : produces the indices of vector the condition is satisfied • X <- c(10,2,4,5,0) • Which(x>2) • Output: 1, 3, 4 • all() : produces a logical vector if a condition is satisfied by all values in a vector • all(x>2): False • any(): produces a logical vector if a condition is satisfied in any values in a vector • Any(x>2) :TRUE Eswar Sai Santosh Bandaru
  • 35. attributes • Attributes: Give additional information about elements of a vector • E.g., names of elements, dimensions, levels • attributes(x) : shows all the available attributes of x • If there are no attributes, r outputs NULL • We can assign attributes to a created vector • E.g., we can assign names to elements with function name() • names(x) <- student_names • Where student names is character vector containing names of students Eswar Sai Santosh Bandaru
  • 36. Subsetting using names attribute • X[‘Cory’] -- prints marks of Cory • Internally…using which() , R gets the index whose attribute name is “Cory” • Then subsets based on the index • X[c(‘Cory’,’James’)] - prints marks of Cory and James • Q16 Eswar Sai Santosh Bandaru
  • 37. Updating a vector: What if Cory’s marks get updated • X[1] <- 35 • Element at index 1 gets updated to 35 • X[x<30 &&x>25] <-40 • All the values which are less than 30 updated to 40 • X[“Cory”] <- 67 Eswar Sai Santosh Bandaru
  • 38. is.na() and mean imputation • x<- c(1,2,4,NA,5,NA) • is.na(x): produces a logical vector, TRUE if element is NA else FALSE • Output: F F F T F T • Replace NA with the mean values???? Eswar Sai Santosh Bandaru
  • 39. Factors attribute • Converts a continuous vector in to a categorical data • X<-c(1,1,1,2,2,2,3,3,3) • Sum(x) : 18 • X<-factors(X) • Sum(x) : error • Levels(x): categories in x • Output: “1” “2” “3” • Class(X) • Output: factor Eswar Sai Santosh Bandaru
  • 40. Table function: frequency table • Counts the number of times an element occurs in vector • X<-c(‘a’,’a’,’a’,’b’,’b’,’c’,’c’) • table(x): • a-3 • b-2 • c-2 • Useful while plotting barplot Eswar Sai Santosh Bandaru
  • 41. ls() and rm() • ls() : Lists all the objects in the current R session(environment) • rm(“d”) : removes the object d • rm( list = ls()): removes all objects from the environment Eswar Sai Santosh Bandaru
  • 42. Data frames: • Data frames are simply “tables” (rows and columns) • Each column should be of same data type (hence all the vector operations are valid for each column) • Creation • X<- data.frame(data for column1, data for column 2,…….) • Column gets binded • 2 dimensional Eswar Sai Santosh Bandaru
  • 43. Subsetting data frames…why? • Very useful for analyzing the data • As it 2 dimensional, it has 2 indices : row * columns • test[3,2] : refers to element in 3rd row 2nd column • test[1:3,1:2]: first three rows, 2 columns • Using column names • test$student_name : refers to column: student_name • Its kind of vector!...so we can perform all vector operations • test["student_name"] : refers to column student_name • test["marks"] Eswar Sai Santosh Bandaru
  • 44. Students with higher than average marks?? • above_average<- (test$marks>mean(test$marks)) • test$student_names[above_average] • Two steps: • above_average is a logical vector • Test$student_names[above_average] selecting students where the vector is True Eswar Sai Santosh Bandaru
  • 45. Writing into csv • Write.csv(test,”test.csv”) • Gets saved to the default directory(folder) R is pointing to • To know the default directory: • Use getwd() Eswar Sai Santosh Bandaru
  • 46. Reading a csv file • setwd(“directory path”) • read.csv(“file name”) • Different function to read different files • dir() : lists all files in the current directory Eswar Sai Santosh Bandaru
  • 47. Data inspection • str() • head() • tail() Eswar Sai Santosh Bandaru
  • 48. Dates and Times in R • Dates are stored internally as the number of days since 1970-01-01 while times are stored internally as the number of seconds since 1970-01-01 Eswar Sai Santosh Bandaru
  • 49. Data Visualization in R: Using R base graphics • 3 types: • base graphics • ggplot2 • lattice • Boxplots • Barplots • Histograms • Scatter plots Eswar Sai Santosh Bandaru