SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Introduction to R for Data Science
Lecturers
dipl. ing Branko Kovač
Data Analyst at CUBE/Data Science Mentor
at Springboard
Data Science zajednica Srbije
branko.kovac@gmail.com
dr Goran S. Milovanović
Data Scientist at DiploFoundation
Data Science zajednica Srbije
goran.s.milovanovic@gmail.com
goranm@diplomacy.edu
Control Flow in R
• for, while, repeat
• if, else
• switch
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Starting with simple 'if‘
num <- 2 # some value to test with
if (num > 0) print("num is positive")
# if condition num > 0 stands than print() is executed
# Sometimes 'if' has its 'else‘
if (num > 0) { # test to see if it's positive
print("num is positive") # print in case of positive number
} else { print("num is negative") # it's negative if not positive }
# Careful: place your else right after the end (‘}’) of the conditional block
Vectorized: ifelse
• for, while, repeat
• if, else, ifelse
• switch
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# R is vectorized so there's vectorized if-else
simple_vect <- c(1, 3, 12, NA, 2, NA, 4) # just another num vector with NAs
ifelse(is.na(simple_vect), "nothing here", "some number")
# nothing here if it's an NA or it's a number
For loops: slow and slower
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# For loop is always working same way
for (i in simple_vect) print(i)
# Be aware that loops can be slow if
vec <- numeric()
system.time(
for(i in seq_len(50000-1)) {
some_calc <- sqrt(i/10)
# this is what makes it slow:
vec <- c(vec, some_calc)
})
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# This solution is slightly faster
iter <- 50000;
# this makes it faster:
vec <- numeric(length=iter)
system.time(
for(i in seq_len(iter-1)) {
some_calc <- sqrt(i/10);
vec[i] <- some_calc # ...not this!
})
For loops: slow and slower
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# This solution is even faster
iter <- 50000
vec <- numeric(length=iter) # not because of this...
system.time(
for(i in seq_len(iter-1)) {
vec[i] <- sqrt(i/10) # ...but because of this!
})
For loops vs. vectorized functions
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Another example how loops can be slow
# (loop vs vectorized functions)
iter <- 50000
system.time(for (i in 1:iter) {
vec[i] <- rnorm(n=1, mean=0, sd=1)
# approach from previous example
})
system.time(y <- rnorm(iter, 0, 1)) # but this is much much faster
while, repeat…
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# R also knows about while loop
r <- 1 # initializing some variable
while (r < 5) { # while r < 5
print(r) # print r
r <- r + 1 # increase r by 1
}
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Nope, we didn't forget 'repeat' loop
i <- 1
repeat { # there is no condition!
print(i)
i <- i + 1
if (i == 10) break
# ...so we have to break it if we
# don't want infinite loop
}
switch
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
switch(2, "data", "science", "serbia") # choose one option based on value
# More on switch:
switchIndicator <- "A“
# switchIndicator <- "switchIndicator“
# switchIndicator <- "AvAvAv“ # play with this three conditions
# rare situations where you do not need to enclose strings: ' ', or " “
switch(switchIndicator,
A = {print(switchIndicator)},
switchIndicator = {unlist(strsplit(switchIndicator,"h"))},
AvAvAv = {print(nchar(switchIndicator))}
)
switch()
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
type = 2
cc <- c("A", "B", "C")
switch(type,
c1 = {print(cc[1])},
c2 = {print(cc[2])},
c3 = {print(cc[3])},
{print("Beyond C...")} # default choice
);
# However…
switch()
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# if you do this, R will miss the default choice, so be careful w. switch:
type = 4
cc <- c("A", "B", "C")
switch(type,
print(cc[1]),
print(cc[2]),
print(cc[3]),
{print("Beyond C...")}
# the unnamed default choice works only
# if previous choices are named!
)
# switch is faster than if… else… (!)
Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
### vectorization in R
dataSet <- USArrests;
# data$Murder, data$Assault, data$Rape: columns of data
# in behavioral sciences (psychology or biomedical sciences, for example) we would call them:
# variables (or factors, even more often)
# in data science and machine learning, we usually call them: FEATURES
# in psychology and behavioral sciences, the usage of the term "feature" is usually constrained
# to theories of categorization and concept learning
# Task: classify the US states according to some global indicator of violent crime
# Two categories (simplification): more dangerous and less dangerous (F)
# We have three features: Murder, Rape, Assault, all per 100,000 inhabitants
# The idea is to combine the three available features.
# Let's assume that we arbitrarily assign the following preference order over the features:
# Murder > Rape > Assault
# in terms of the severity of the consequences of the associated criminal acts
Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Let's first isolate the features from the data.frame
featureMatrix <- as.matrix(dataSet[, c(1,4,2)]);
# Let's WEIGHT the features in accordance with the imposed preference order:
weigthsVector <- c(3,2,1); # mind the order of the columns in featureMatrix
# Essentially, we want our global indicator to be a linear combination of all three selected
# features, where each feature is weighted by the corresponding element of the weigthsVector:
featureMatrix <- cbind(featureMatrix,numeric(length(featureMatrix[,1])));
for (i in 1:length(featureMatrix[,1])) {
featureMatrix[i,4] <- sum(weigthsVector*featureMatrix[i,1:3]);
# don't forget: this "*" multiplication in R is vectorized and operates element-wise
# we have a 1x3 weightsVector and a 1x3 featureMatrix[i,1:3], Ok
# sum() then produces the desired linear combination
}
Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Classification; in the simplest case, let's simply take a look at
# the distribution of our global indicator:
hist(featureMatrix[,4],20); # it's multimodal and not too symmetric; go for median
criterion <- median(featureMatrix[,4]);
# And classify:
dataSet$Dangerous <- ifelse(featureMatrix[,4]>=criterion,T,F);
# Ok. You will never do this before you have a model that has actually *learned* the
# most adequate feature weights. This is an exercise only.
# ***Important***: have you seen the for loop above? Well...
# N e v e r d o t h a t.
dataSet$Dangerous <- NULL;
Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# In Data Science, you will be working with huge amounts of quantitative data.
# For loops are slow. But in vector programming languages like R...
# matrix computations are seriously fast.
# What you ***want to do*** is the following:
# Let's first isolate the features from the data.frame
featureMatrix <- as.matrix(dataSet[, c(1,4,2)]);
# Let's WEIGHT the features in accordance with the imposed preference order:
weigthsVector <- c(3,2,1); # mind the order of the columns in featureMatrix
# Feature weighting:
wF <- weigthsVector %*% t(featureMatrix);
# In R, t() is for: transpose
# In R, %*% is matrix multiplication
Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# oh yes: R knows about row and column vectors - and you want to put this one
# as a COLUMN in your dataSet data.frame, while wF is currently a ROW vector, look:
wF
length(wF)
wF <- t(wF)
# and classify:
dataSet$Dangerous <- ifelse(wF>=median(wF),T,F);
Introduction to R for Data Science :: Session 4

Más contenido relacionado

La actualidad más candente

Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP
Amrit Khandelwal
 

La actualidad más candente (20)

Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Python pandas Library
Python pandas LibraryPython pandas Library
Python pandas Library
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP Wireless sensor network Apriori an N-RMP
Wireless sensor network Apriori an N-RMP
 
Stack Algorithm
Stack AlgorithmStack Algorithm
Stack Algorithm
 
Reproducibility with R
Reproducibility with RReproducibility with R
Reproducibility with R
 
Merge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RMerge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using R
 
Rbootcamp Day 1
Rbootcamp Day 1Rbootcamp Day 1
Rbootcamp Day 1
 
R language
R languageR language
R language
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
C programming
C programmingC programming
C programming
 
A brief introduction to lisp language
A brief introduction to lisp languageA brief introduction to lisp language
A brief introduction to lisp language
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Extracting data from text documents using the regex
Extracting data from text documents using the regexExtracting data from text documents using the regex
Extracting data from text documents using the regex
 
The Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERRThe Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERR
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 

Similar a Introduction to R for Data Science :: Session 4

INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
carliotwaycave
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918
Yen_CY
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918
Chia-Yi Yen
 

Similar a Introduction to R for Data Science :: Session 4 (20)

Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project Analyzed
 
R basics
R basicsR basics
R basics
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptx
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918
 
[計一] Basic r programming final0918
[計一] Basic r programming   final0918[計一] Basic r programming   final0918
[計一] Basic r programming final0918
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
MLflow with R
MLflow with RMLflow with R
MLflow with R
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 

Más de Goran S. Milovanovic

Geneva Social Media Index - Report 2015 full report
Geneva Social Media Index - Report 2015 full reportGeneva Social Media Index - Report 2015 full report
Geneva Social Media Index - Report 2015 full report
Goran S. Milovanovic
 
Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...
Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...
Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...
Goran S. Milovanovic
 
247113920-Cognitive-technologies-mapping-the-Internet-governance-debate
247113920-Cognitive-technologies-mapping-the-Internet-governance-debate247113920-Cognitive-technologies-mapping-the-Internet-governance-debate
247113920-Cognitive-technologies-mapping-the-Internet-governance-debate
Goran S. Milovanovic
 

Más de Goran S. Milovanovic (20)

Uvod u R za Data Science :: Sesija 1 [Intro to R for Data Science :: Session 1]
Uvod u R za Data Science :: Sesija 1 [Intro to R for Data Science :: Session 1]Uvod u R za Data Science :: Sesija 1 [Intro to R for Data Science :: Session 1]
Uvod u R za Data Science :: Sesija 1 [Intro to R for Data Science :: Session 1]
 
Geneva Social Media Index - Report 2015 full report
Geneva Social Media Index - Report 2015 full reportGeneva Social Media Index - Report 2015 full report
Geneva Social Media Index - Report 2015 full report
 
Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...
Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...
Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...
 
247113920-Cognitive-technologies-mapping-the-Internet-governance-debate
247113920-Cognitive-technologies-mapping-the-Internet-governance-debate247113920-Cognitive-technologies-mapping-the-Internet-governance-debate
247113920-Cognitive-technologies-mapping-the-Internet-governance-debate
 
Učenje i viši kognitivni procesi 10. Simboličke funkcije, VI Deo: Rešavanje p...
Učenje i viši kognitivni procesi 10. Simboličke funkcije, VI Deo: Rešavanje p...Učenje i viši kognitivni procesi 10. Simboličke funkcije, VI Deo: Rešavanje p...
Učenje i viši kognitivni procesi 10. Simboličke funkcije, VI Deo: Rešavanje p...
 
Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Rezonovanje u...
Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Rezonovanje u...Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Rezonovanje u...
Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Rezonovanje u...
 
Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Suđenje, heur...
Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Suđenje, heur...Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Suđenje, heur...
Učenje i viši kognitivni procesi 9. Simboličke funkcije, V Deo: Suđenje, heur...
 
Učenje i viši kognitivni procesi 8. Simboličke funkcije, IV Deo: Analogija i ...
Učenje i viši kognitivni procesi 8. Simboličke funkcije, IV Deo: Analogija i ...Učenje i viši kognitivni procesi 8. Simboličke funkcije, IV Deo: Analogija i ...
Učenje i viši kognitivni procesi 8. Simboličke funkcije, IV Deo: Analogija i ...
 
Učenje i viši kognitivni procesi 9. Simboličke funkcije, III Deo: Kauzalnost,...
Učenje i viši kognitivni procesi 9. Simboličke funkcije, III Deo: Kauzalnost,...Učenje i viši kognitivni procesi 9. Simboličke funkcije, III Deo: Kauzalnost,...
Učenje i viši kognitivni procesi 9. Simboličke funkcije, III Deo: Kauzalnost,...
 
Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Distribuiran...
Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Distribuiran...Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Distribuiran...
Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Distribuiran...
 
Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Konekcioniza...
Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Konekcioniza...Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Konekcioniza...
Učenje i viši kognitivni procesi 8. Simboličke funkcije, II Deo: Konekcioniza...
 
Učenje i viši kognitivni procesi 7a. Simboličke funkcije, I Deo: Učenje kateg...
Učenje i viši kognitivni procesi 7a. Simboličke funkcije, I Deo: Učenje kateg...Učenje i viši kognitivni procesi 7a. Simboličke funkcije, I Deo: Učenje kateg...
Učenje i viši kognitivni procesi 7a. Simboličke funkcije, I Deo: Učenje kateg...
 
Učenje i viši kognitivni procesi 7. Simboličke funkcije, I Deo: Koncepti, kat...
Učenje i viši kognitivni procesi 7. Simboličke funkcije, I Deo: Koncepti, kat...Učenje i viši kognitivni procesi 7. Simboličke funkcije, I Deo: Koncepti, kat...
Učenje i viši kognitivni procesi 7. Simboličke funkcije, I Deo: Koncepti, kat...
 
Učenje i viši kognitivni procesi 7. Učenje, IV Deo: Neasocijativno učenje, ef...
Učenje i viši kognitivni procesi 7. Učenje, IV Deo: Neasocijativno učenje, ef...Učenje i viši kognitivni procesi 7. Učenje, IV Deo: Neasocijativno učenje, ef...
Učenje i viši kognitivni procesi 7. Učenje, IV Deo: Neasocijativno učenje, ef...
 
Učenje i viši kognitivni procesi 6. Učenje, III Deo: Hernstejnov zakon slagan...
Učenje i viši kognitivni procesi 6. Učenje, III Deo: Hernstejnov zakon slagan...Učenje i viši kognitivni procesi 6. Učenje, III Deo: Hernstejnov zakon slagan...
Učenje i viši kognitivni procesi 6. Učenje, III Deo: Hernstejnov zakon slagan...
 
Učenje i viši kognitivni procesi 6. Učenje, III Deo: Instrumentalno učenje
Učenje i viši kognitivni procesi 6. Učenje, III Deo: Instrumentalno učenjeUčenje i viši kognitivni procesi 6. Učenje, III Deo: Instrumentalno učenje
Učenje i viši kognitivni procesi 6. Učenje, III Deo: Instrumentalno učenje
 
Učenje i viši kognitivni procesi 5. Učenje, II Deo: Blokiranje, osenčavanje, ...
Učenje i viši kognitivni procesi 5. Učenje, II Deo: Blokiranje, osenčavanje, ...Učenje i viši kognitivni procesi 5. Učenje, II Deo: Blokiranje, osenčavanje, ...
Učenje i viši kognitivni procesi 5. Učenje, II Deo: Blokiranje, osenčavanje, ...
 
Učenje i viši kognitivni procesi 5. Učenje, II Deo: klasično uslovljavanje i ...
Učenje i viši kognitivni procesi 5. Učenje, II Deo: klasično uslovljavanje i ...Učenje i viši kognitivni procesi 5. Učenje, II Deo: klasično uslovljavanje i ...
Učenje i viši kognitivni procesi 5. Učenje, II Deo: klasično uslovljavanje i ...
 
Učenje i viši kognitivni procesi 5. Učenje, I Deo
Učenje i viši kognitivni procesi 5. Učenje, I DeoUčenje i viši kognitivni procesi 5. Učenje, I Deo
Učenje i viši kognitivni procesi 5. Učenje, I Deo
 
Učenje i viši kognitivni procesi 4a. Debata o racionalnosti, nastavak
Učenje i viši kognitivni procesi 4a. Debata o racionalnosti, nastavakUčenje i viši kognitivni procesi 4a. Debata o racionalnosti, nastavak
Učenje i viši kognitivni procesi 4a. Debata o racionalnosti, nastavak
 

Último

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Último (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Introduction to R for Data Science :: Session 4

  • 1. Introduction to R for Data Science Lecturers dipl. ing Branko Kovač Data Analyst at CUBE/Data Science Mentor at Springboard Data Science zajednica Srbije branko.kovac@gmail.com dr Goran S. Milovanović Data Scientist at DiploFoundation Data Science zajednica Srbije goran.s.milovanovic@gmail.com goranm@diplomacy.edu
  • 2. Control Flow in R • for, while, repeat • if, else • switch Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # Starting with simple 'if‘ num <- 2 # some value to test with if (num > 0) print("num is positive") # if condition num > 0 stands than print() is executed # Sometimes 'if' has its 'else‘ if (num > 0) { # test to see if it's positive print("num is positive") # print in case of positive number } else { print("num is negative") # it's negative if not positive } # Careful: place your else right after the end (‘}’) of the conditional block
  • 3. Vectorized: ifelse • for, while, repeat • if, else, ifelse • switch Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # R is vectorized so there's vectorized if-else simple_vect <- c(1, 3, 12, NA, 2, NA, 4) # just another num vector with NAs ifelse(is.na(simple_vect), "nothing here", "some number") # nothing here if it's an NA or it's a number
  • 4. For loops: slow and slower Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # For loop is always working same way for (i in simple_vect) print(i) # Be aware that loops can be slow if vec <- numeric() system.time( for(i in seq_len(50000-1)) { some_calc <- sqrt(i/10) # this is what makes it slow: vec <- c(vec, some_calc) }) # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # This solution is slightly faster iter <- 50000; # this makes it faster: vec <- numeric(length=iter) system.time( for(i in seq_len(iter-1)) { some_calc <- sqrt(i/10); vec[i] <- some_calc # ...not this! })
  • 5. For loops: slow and slower Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # This solution is even faster iter <- 50000 vec <- numeric(length=iter) # not because of this... system.time( for(i in seq_len(iter-1)) { vec[i] <- sqrt(i/10) # ...but because of this! })
  • 6. For loops vs. vectorized functions Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # Another example how loops can be slow # (loop vs vectorized functions) iter <- 50000 system.time(for (i in 1:iter) { vec[i] <- rnorm(n=1, mean=0, sd=1) # approach from previous example }) system.time(y <- rnorm(iter, 0, 1)) # but this is much much faster
  • 7. while, repeat… Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # R also knows about while loop r <- 1 # initializing some variable while (r < 5) { # while r < 5 print(r) # print r r <- r + 1 # increase r by 1 } # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # Nope, we didn't forget 'repeat' loop i <- 1 repeat { # there is no condition! print(i) i <- i + 1 if (i == 10) break # ...so we have to break it if we # don't want infinite loop }
  • 8. switch Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 switch(2, "data", "science", "serbia") # choose one option based on value # More on switch: switchIndicator <- "A“ # switchIndicator <- "switchIndicator“ # switchIndicator <- "AvAvAv“ # play with this three conditions # rare situations where you do not need to enclose strings: ' ', or " “ switch(switchIndicator, A = {print(switchIndicator)}, switchIndicator = {unlist(strsplit(switchIndicator,"h"))}, AvAvAv = {print(nchar(switchIndicator))} )
  • 9. switch() Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 type = 2 cc <- c("A", "B", "C") switch(type, c1 = {print(cc[1])}, c2 = {print(cc[2])}, c3 = {print(cc[3])}, {print("Beyond C...")} # default choice ); # However…
  • 10. switch() Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # if you do this, R will miss the default choice, so be careful w. switch: type = 4 cc <- c("A", "B", "C") switch(type, print(cc[1]), print(cc[2]), print(cc[3]), {print("Beyond C...")} # the unnamed default choice works only # if previous choices are named! ) # switch is faster than if… else… (!)
  • 11. Vectorization Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 ### vectorization in R dataSet <- USArrests; # data$Murder, data$Assault, data$Rape: columns of data # in behavioral sciences (psychology or biomedical sciences, for example) we would call them: # variables (or factors, even more often) # in data science and machine learning, we usually call them: FEATURES # in psychology and behavioral sciences, the usage of the term "feature" is usually constrained # to theories of categorization and concept learning # Task: classify the US states according to some global indicator of violent crime # Two categories (simplification): more dangerous and less dangerous (F) # We have three features: Murder, Rape, Assault, all per 100,000 inhabitants # The idea is to combine the three available features. # Let's assume that we arbitrarily assign the following preference order over the features: # Murder > Rape > Assault # in terms of the severity of the consequences of the associated criminal acts
  • 12. Vectorization Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # Let's first isolate the features from the data.frame featureMatrix <- as.matrix(dataSet[, c(1,4,2)]); # Let's WEIGHT the features in accordance with the imposed preference order: weigthsVector <- c(3,2,1); # mind the order of the columns in featureMatrix # Essentially, we want our global indicator to be a linear combination of all three selected # features, where each feature is weighted by the corresponding element of the weigthsVector: featureMatrix <- cbind(featureMatrix,numeric(length(featureMatrix[,1]))); for (i in 1:length(featureMatrix[,1])) { featureMatrix[i,4] <- sum(weigthsVector*featureMatrix[i,1:3]); # don't forget: this "*" multiplication in R is vectorized and operates element-wise # we have a 1x3 weightsVector and a 1x3 featureMatrix[i,1:3], Ok # sum() then produces the desired linear combination }
  • 13. Vectorization Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # Classification; in the simplest case, let's simply take a look at # the distribution of our global indicator: hist(featureMatrix[,4],20); # it's multimodal and not too symmetric; go for median criterion <- median(featureMatrix[,4]); # And classify: dataSet$Dangerous <- ifelse(featureMatrix[,4]>=criterion,T,F); # Ok. You will never do this before you have a model that has actually *learned* the # most adequate feature weights. This is an exercise only. # ***Important***: have you seen the for loop above? Well... # N e v e r d o t h a t. dataSet$Dangerous <- NULL;
  • 14. Vectorization Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # In Data Science, you will be working with huge amounts of quantitative data. # For loops are slow. But in vector programming languages like R... # matrix computations are seriously fast. # What you ***want to do*** is the following: # Let's first isolate the features from the data.frame featureMatrix <- as.matrix(dataSet[, c(1,4,2)]); # Let's WEIGHT the features in accordance with the imposed preference order: weigthsVector <- c(3,2,1); # mind the order of the columns in featureMatrix # Feature weighting: wF <- weigthsVector %*% t(featureMatrix); # In R, t() is for: transpose # In R, %*% is matrix multiplication
  • 15. Vectorization Intro to R for Data Science Session 4: Control Flow # Introduction to R for Data Science # SESSION 4 :: 19 May, 2016 # oh yes: R knows about row and column vectors - and you want to put this one # as a COLUMN in your dataSet data.frame, while wF is currently a ROW vector, look: wF length(wF) wF <- t(wF) # and classify: dataSet$Dangerous <- ifelse(wF>=median(wF),T,F);