This document provides an overview of descriptive statistics and functions in R. It discusses how to create frequency tables and cross tabulations to analyze one and two-dimensional data. It also lists built-in R functions for calculating common statistics like mean, median, standard deviation, as well as functions for probability distributions. Finally, it provides references for further reading on R and statistics.
2. Contents
2
Tabulation & Cross Tabulation
Built-in Functions for Descriptive Statistics
Built-in Functions for Probability Distribution
References
3. 3
Frequency Tables
To create a one-dimensional frequency table
Load & understand data
data = iris; names(data); help(iris)
Create frequency table for variable ‘Species’ of ‘iris’ data set
table(iris$Species)
4. 4
Cross Tabulation
To create a two-dimensional contingency table
Load & understand data
data = iris; names(data); help(iris)
Create frequency table of sepal length of different species of
‘iris’ data set
table(iris$Sepal.Length, iris$Species)
Example_2:
mtcars; table(mtcars$cyl, mtcars$gear)
5. 5
Cross Tabulation
To create a two-dimensional contingency table
Load & understand data
read.csv("people2.csv"); names(people2)
Create frequency table of eye color of people categorized by
sex
table2 <- table(people2$Eye.Color, people2$Sex)
6. 6
Cross Tabulation
Table of proportions
prop.table(table2)
# table with each cell count expressed as a proportion of the
total count
prop.table(table2, margin=1)
# table with each cell count expressed as a proportion of row
total
prop.table(table2, margin=2) # table with each cell count
expressed as a proportion of column total
Table of percentages
round(prop.table(table2)*100)
7. 7
Cross Tabulation
Table with marginal sums
add.margins(table2)
# table with row and column totals
addmargins(table2, margin=1)
# table with row total
addmargins(table2, margin= 2)
# table with column total
To perform chi-square test of association
summary(table2)
8. 8
Cross Tabulation
To create a three-dimensional contingency table
Load & understand data
people2 <- read.csv("people2.csv"); names(people2)
Create frequency table of eye color of people categorized by
their sex & height
table(people2$Eye.Color, people2$Sex, people2$Height.Cat)
9. Built-in functions for Descriptive Statistics
9
Statistics Function name/command
Mean mean
Median median
Standard Deviation sd
Median Absolute Deviation mad
Variance var
Maximum Value max
Minimum Value min
Range range
Interquartile range IQR
10. Built-in functions for Descriptive Statistics
10
Statistics Function name/command
Quantiles quantile
Tukey’s five-number
summary
[min, I, II, III quartile, max
value]
fivenum
Sum sum
Product prod
Number of observations length
Standardize/ Computing
z-score
scale(x, center=TRUE, scale=TRUE)
Mean Centering scale(x, scale=FALSE)
11. Built-in functions for Probability Distribution
11
Statistics Function name/command
Normal PDF dnorm
Cumulative Normal
Probability
pnorm
Normal Quantile qnorm
Random NormalVariable rnorm
Similarly, there are functions for other distributions like rbinom,
rpois, rt, runif, rf etc.
13. 13
My Interesting answers/posts
To understand results of logistics regression or other classifiers
https://learnerworld.tumblr.com/post/152327498485/enjoystatisticswith
mebinaryclassifierperformance
Hypothesis testing in layman’s terms
https://learnerworld.tumblr.com/search/hypothesis
Understanding mediation effect
https://learnerworld.tumblr.com/post/146541892120/mediation-
effectenjoystatisticswtihme
14. 14
My Interesting answers/posts
Dependence Vs Correlation
https://www.quora.com/What-is-the-difference-between-dependence-
and-correlation/answer/Nisha-Arora-9
Co-linearity & Correlation
https://www.quora.com/In-statistics-what-is-the-difference-between-
collinearity-and-correlation/answer/Nisha-Arora-9
15. References
15
• Crowley, M. J. (2007). The R Book. Chichester, New
England: John Wiley & Sons, Ltd.
• An Introduction to R by W. N. Venables, D. M. Smith
and the R Core Team
• R in a Nutshell by Joseph Adler: O’Reilly
• Teetor, P. (2011). R cookbook. Sebastopol, CA:
O’Reilly Media Inc.
18. 18
My Expertise
❖ Statistics/ Data Analysis
❖ Machine Learning/Data Science/Deep Learning
❖ Operations Research/Mathematics
❖ R Programming/Flexdashboard/Shiny R
❖ Python/Jupyter Lab/ Google Co-lab
❖ TensorFlow/Keras
❖ SPSS/SAS/Eviews/Excel/Power-Point
19. 19
Follow me for More
https://www.youtube.com/c/DrNisha
Arora
Subscribe to my channel for Data Science, R, Python, Statistics,
Excel, Data Viz, Storytelling & Much More
Follow me on LinkedIn for quick posts/articles on Data Science, R,
Python, Statistics, Excel, Data Viz, Storytelling & Much More
https://www.linkedin.com/in/drnishaarora/