The document discusses R commands for generating sequences of numbers, including seq() and related functions. It demonstrates how to create sequences with regular increments, decreasing values, negative numbers, and non-integer increments. Examples show adding color and labels to histograms to visualize data distributions. The summary discusses measures of central tendency and spread for a sample dataset, including mean, variance, standard deviation, median, and the difference between fivenum and quantile summaries.
How to Troubleshoot Apps for the Modern Connected Worker
R part II
1. Vector
with regularly spaced numbers
> 1:10
[1] 1 2 3
> seq(1,10)
[1] 1 2 3
> seq(1,10,2)
[1] 1 3 5 7 9
4
5
6
7
8
9 10
4
5
6
7
8
9 10
• We have used both “:” operator and seq command
• Note the last command where we have used “2” as
step, which is the “by” argument of the seq command
4. Think and try …...
Generate a sequence of the following numbers:
0.0 0.2 0.4 0.6 0.8 1.0 1.0 2.0 3.0
6.0 7.0 8.0 9.0 10.0 100.0
4.0
Hints
• You have to use more than one sequence.
• But how will you include “100”?
5.0
7. Try replicate or rep command
> rep(1:4, each = 2)
[1] 1 1 2 2 3 3 4 4
> rep(1:4, c(2,2,2,2))
[1] 1 1 2 2 3 3 4 4
> rep(5:8, c(2,1,2,1))
[1] 5 5 6 7 7 8
> rep(1:4, each = 2, len = 4)
[1] 1 1 2 2
Hope you are enjoying as we go….. Have you noted the
arguments “each” and “len”gth? Now note the “times”
argument
> rep(1:4, each = 2, times = 3)
[1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
8. Try Histogram….
Suppose the top 25 ranked movies made the following gross receipts
for a Week:
29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3
0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1
Scan the data and then draw some histograms.
> x
[1] 29.6 28.2 19.6 13.7 13.0
0.4 0.4 0.3 0.3 0.3
[17] 0.3 0.3 0.2 0.2 0.2
> receipts<-x
> hist(receipts)
7.8
3.4
2.0
1.9
1.0
0.1
0.1
0.1
0.1
0.1
0.7
9. Try Histogram….
Suppose the top 25 ranked movies made the following gross receipts
for a Week:
29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3
0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1
10. Now try better histograms ….
Add colour, change colour, add title for the histogram, add title
for x-axis and then y-axis
> hist(receipts, col="red2")
> hist(receipts, col="red4")
> hist(receipts, col="red2",main="Gross Receipts for
first 25 ranked movies")
> hist(receipts, col="red2",main="Gross Receipts for
first 25 ranked movies",xlab="receipts in a week")
> hist(receipts, col="red2",main="Gross Receipts for
first 25 ranked movies",xlab="receipts in a
week",ylab="count of movies")
11. Now try better histograms ….
Your new histogram should look like this
12. Now try better histograms ….
Now put the range for x-axis and y-axis
> hist(receipts, col="red2",main="Gross Receipts for first 25
ranked movies",xlab="receipts in a week",ylab="count of
movies",xlim=c(0.1,35),ylim=c(0,25))
13. Now more about histograms ….
Now try breaks=….
What is “breaks”?
> hist(receipts,breaks=3,col="red2",main="Gross Receipts
for first 25 ranked movies",xlab="receipts in a
week",ylab="count of movies")
Remember:
Breaks is just a
suggestion to R
14. Now more about breaks ….
“breaks” can also specify the actual break points
in a histogram
> hist(receipts,breaks=c(0,1,2,3,4,5,10,20,max(x)),col="violetred")
Note the break points
15. Summary and Fivenum
Suppose, CEO yearly compensations are sampled and the
following are found (in millions).
12 0.4 5 2 50 8 3 1 4 0.25
> sals
[1] 12.00 0.40 5.00 2.00 50.00 8.00 3.00 1.00 4.00 0.25
> mean(sals) # the average
[1] 8.565
> var(sals) # the variance
[1] 225.5145
> sd(sals) # the standard deviation
[1] 15.01714
> median(sals) # the median
[1] 3.5
> summary(sals)
Min. 1st Qu. Median
Mean 3rd Qu.
Max.
0.250
1.250
3.500
8.565
7.250 50.000
> fivenum(sals) # min, lower hinge, Median, upper hinge, max
[1] 0.25 1.00 3.50 8.00 50.00
> quantile(sals)
0%
25%
50%
75% 100%
0.25 1.25 3.50 7.25 50.00
17. Difference between
Fivenum and Quantile:
Lower and Upper Hinge
The sorted data:
0.25 0.4 1 2 3 3.5 4 5 8 12 50
Median = 3.5
• The lower hinge is the median of all the data to the left of
the median (3.5), not counting this particular data point (if it
is one.)
• The upper hinge is similarly defined.