Revision workshop 17 january 2013

REVISION WORKSHOP
NUBE
17 TH JANUARY 2013

Organising and graphing quantitative data in a frequency
distribution table.
• Frequency table consists of a number of classes and each
observation is counted and recorded as the frequency of the
class.
• If n observations need to be classified into a frequency table,
determine:
– Number of classes:
c  1  3,3log n
xmax  xmin
– Class width 
c
2

Organising and graphing quantitative data in a frequency
distribution table.
Example:
The following data represents the number of telephone calls received
for two days at a municipal call centre. The data was measured per
hour.

8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
3

Frequency distribution
Number of classes  1  3,3log n
 1  3,3log 48  6,5  7
xmax  xmin 22  2
Class width    2,86  3
k 7
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2 4

– first class [ xmin; ; min) class width)
2 5)32x

– second class [ 5 ;; 8  3 ) width)
5
5 5 ) class

“[“ value is included in class
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
“)“ value is excluded from class
11 13 22 11 11 14 11 10 9
19 14 17 9 3 3 16 8 2
5

Classes Count
[2;5) │││ 3
8 11 12 20 …. [5;8) |││││
| 4
5 7 11 12 …. [8;11) |│││││││││││ 11
6 18 9 15 …. [11;14) |│││││││││││││
| 13
11 13 22 11 ….
[14;17) │││││││││ 9
19 14 17 9 ….
[17;20) |││││││ 6

[20;23) ││ 2

6

Classes Frequency (f)
[2;5) 3
[5;8) 4
[8;11) 11
[11;14) 13
[14;17) 9
[17;20) 6
[20;23) 2
Total 48
7

Classes f % frequency
[2;5) 3 3/48×100 = 6,3
[5;8) 4 4/48×100 = 8,3
[8;11) 11 11/48×100 = 22,9
[11;14) 13 27,1
[14;17) 9 18,8
[17;20) 6 12,5
[20;23) 2 4,2
Total 48 100
8

Classes f %f Cumulative frequency (F)
[2;5) 3 6,3 3
[5;8) 4 8,3 3+4=7
[8;11) 11 22,9 7 + 11 = 18
[11;14) 13 27,1 18 + 13 = 31
[14;17) 9 18,8 31 + 9 = 40
[17;20) 6 12,5 40 + 6 = 46
[20;23) 2 4,2 46 + 2 = 48
Total 48 100
9

Classes f %f F %F
[2;5) 3 6,3 3 3/48×100 = 6,3
[5;8) 4 8,3 7 7/48×100 = 14,6
[8;11) 11 22,9 18 18/48×100 = 37,5
[11;14) 13 27,1 31 64,6
[14;17) 9 18,8 40 83,3
[17;20) 6 12,5 46 95,8
[20;23) 2 4,2 48 100
Total 48 100
10

Classes f F Class mid-points (x)
[2;5) 3 3 (2 + 5)/2 = 3,5
[5;8) 4 7 (5 + 8)/2 = 6,5
[8;11) 11 18 (8 + 11)/2 = 9,5
[11;14) 13 31 (11 + 14)/2 = 12,5
[14;17) 9 40 15,5
[17;20) 6 46 18,5
[20;23) 2 48 21,5
Total 48
11

Classes f %f F %F (x)
[2;5) 3 6,3 3 6,3 3,5
[5;8) 4 8,3 7 14,6 6,5
[8;11) 11 22,9 18 37,5 9,5
[11;14) 13 27,1 31 64,6 12,5
[14;17) 9 18,8 40 83,3 15,5
[17;20) 6 12,5 46 95,8 18,5
[20;23) 2 4,2 48 100 21,5
Total 48 100
12

Histograms
Classes f %f
[2;5) 3 6,3
[5;8) 4 8,3
[8;11) 11 22,9 y-axis

[11;14) 13 27,1
[14;17) 9 18,8
[17;20) 6 12,5
[20;23) 2 4,2 x-axis

13

Histograms
Number of telephone calls per hour
at a municipal call centre

14
Number of hours

12
10
8
6
4
2
0
2 5 8 11 14 17 20 23

Number of calls
14

Definitions
Frequency Polygon
A line graph of a frequency distribution and offers a
useful alternative to a histogram. Frequency polygon is
useful in conveying the shape of the distribution
Ogive
A graphic representation of the cumulative frequency
distribution. Used for approximating the number of
values less than or equal to a specified value

15

Frequency polygons
Class mid-points (x) f %f
3,5 3 6,3
6,5 4 8,3
9,5 11 22,9 y-axis

12,5 13 27,1
15,5 9 18,8
18,5 6 12,5
21,5 2 4,2 x-axis

16

Frequency polygons
Number of telephone calls per hour
at a municipal call centre (x)
14 3,5
Number of hours

12 6,5
10
8
9,5
6 12,5
4
2
15,5
0 18,5
0.5 3.5 6.5 9.5 12.5 15.5 18.5 21.5 24.5
21,5
Arbitrary mid-points to Number of calls
close the polygon. 17

Ogives
Classes F %F
[2;5) 3 6,3
[5;8) 7 14,6
[8;11) 18 37,5 y-axis

[11;14) 31 64,6
[14;17) 40 83,3
[17;20) 46 95,8
[20;23) 48 100 x-axis

18

Ogives
Ogive of number of call received
at a call centre per hour

100
number of hours

90
% Cumulative

80
70
60
50
40
30
20
10
0
2 5 8 11 14 17 20 23
Number of calls

None of the hours had
less than 2 calls. 19

Ogives Ogive of number of call received
20% of the
hours had at a call centre per hour
more than
17 calls 100
number of hours

per hour. 90
% Cumulative

80
70
80% of the 60
hours had 50
less than 40
30
17 calls 20
per hour. 10
0
2 5 8 11 14 17 20 23
50% of Number ofhad less
the hours calls
than 12 calls per hour.

20

Exam question 2
A garbage removal company would like to start charging by the
weight of a customers bin rather than by the number of bins put
out. They select a sample of 25 customers and weigh their
garbage bins. The weights in kg are given below:-
14.5 5.2 16.0 14.7 15.6 18.9 13.5 24.6 24.5 7.4
13.2 23.4 13.9 12.0 22.5 31.4 16.1 10.9 25.1 22.1
14.8 15.1 4.9 17.0 10.3

1. Construct a frequency table to describe the data. Include a
frequency and relative (%) frequency column. (Hint: start the
class intervals with the whole number just smaller than the
lowest value in the dataset)

Procedure
1. Calculate the range of the dataset
2. Calculate the no of classes
3. Calculate the class width
4. Construct table showing the intervals calculated in 1 to 3
5. Put in the tally for each interval and then show as frequency
6. Calculate the relative (%) frequency

13 marks

Range
31.4 - 4.9 = 26.5
No of classes
K or c= 1+3.3logn
n = 25 K or c= 3.3 log (25) = 5.61 ≈ 6

Class Width
xmax  xmin = 26.5/6 = 4.41 ≈ 5
Class width 
c

No of classes = 6 Class width = 5

INTERVALS TALLY FREQUENCY (f) RELATIVE
FREQUENCY (%f)
4-<9 111 3 12
9 - < 14 1111 1 6 24
14 - < 19 1111 1111 9 36
19 - < 24 111 3 12
24 - < 29 111 3 12
29 - < 34 1 1 4
25 100

Exam question 2
2. Comment on the interval 4% of bins weighed between
containing the lowest 29 & 34 kg
percentage

3. In which interval do the data Largest no. of bins weighed
tend to cluster? Which between 14 & 19kg. We
descriptive statistics measure, assume mode will fall in this
can we assume, would be
interval (highest frequency)
found in this interval?

4. Comment on the shape of +ve skewed as more
the distribution without values located in lower
drawing a graph . Give reasons intervals
7 MARKS

Quartiles & Box & Whisker Plots

• Quartiles
• Percentiles
• Interquartile range

27

• QUARTILES
– Order data in ascending order.
– Divide data set into four quarters.

25% 25% 25% 25%
Min Q1 Q2 Q3 Max

29

Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine Q1 for the sample of nine measurements:
•Order the measurements
−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

Q1 is the  n  1 
1
4
  9  1 
1
4
 2,5th value
Find difference between data for 2 & 3
2-(-3)=5 and multiply by the decimal portion of value : 5 x 0.5 = 2.5
30
Add to smallest figure: -3 + 2.5: Q1 = 0.5

2 5 8 −3 5 2 6 5 −4
Determine Q3 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

Q3 is the  n  1 
3
4
  9  1 
3
4
 7,5th value
Q3 = 5 + 0,5(6 − 5) = 5,5
31

2 5 8 −3 5 2 6 5 −4
Interquartile range = Q3 – Q1
Q3 = 5,5
Q1 = −0,5
Interquartile range
= 5,5 – (−0,5)
=6
32

INTERQUARTILE RANGE (IQR)
• Difference between the third and first
quartiles
• Indicates how far apart the first and third
quartiles are

IQR = Q3 – Q1

33

BOX & WHISKER PLOT
• Provides a graphical summary of data based
on 5 summary measures or values
– First quartile, median, third quartile ,lower limit,
upper limit
• Box and whisker plot detects outliers in a data
set
LL = Q1 – 1,5 (IQR)
UL = Q3 + 1,5 (IQR)

34

BOX-AND-WISKER PLOT
Me = 12,38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15,67
Q1 = 9,36 UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14
IRR = 6,31
1,5(IQR) IQR 1,5(IQR)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
• Any value larger than 25,14 will be an outlier. 35

Exam question 3
The Tubeka brothers spent the following amounts in Rand on groceries over
the last 8 weeks:-
54 56 89 67 74 57 43 51

1. Calculate a five number summary table
2. Construct a box and whisker plot for the data
3. Determine whether there are any outliers. Show calculations
20 MARKS

PROCEDURE
1. Reorder the data set
2. Identify maximum and minimum values in dataset
3. Calculate median
4. Calculate Q1 & Q3
5. Construct plot
6. Calculate upper & lower limits for dataset to determine if outliers present

43 51 54 56 57 67 74 89

xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25

Q1 = (n+1) (1/4) = (8+1) x ¼ = 2.25 value
Between 51 & 54
54-51 = 3 multiply by decimal portion of value 3x 0.25 = 0.75 and add the lower value

Q1 = 51 + 0.75 = 51.75

Q3 = (n+1) (¾) = (8+1) x ¾ = 6.75 value
Between 67 & 74
74 – 67 = 7 multiply by decimal portion of value 7 x 0.75 = 5.25 and add lower value

Q3 = 67 + 5.25 = 72.25

43 51 54 56 57 67 74 89

xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25

OUTLIERS
1. Calculate upper & lower limits

LL = Q1 – 1,5 (IQR)
UL = Q3 + 1,5 (IQR)
IQR = 72.25 – 51.75 = 20.5

LL = 51.75 – 1,5(20.5) = 21
UL = 72.25 + 1.5(20.5) = 103

No values smaller than 21 or greater than 103 therefore no outliers present

• ARITHMETIC MEAN
– Data is given in a frequency table
– Only an approximate value of the mean

x
fx i i

f i

where f i  frequency of the i th class interval
xi = class midpoint of the i th class interval

40

• MEDIAN
– Data is given in a frequency table.
– First cumulative frequency ≥ n/2 will indicate the
median class interval.
– Median can also be determined from the ogive.
 ui  li   n  Fi 1 
M e  li 
2

fi
where li = lower boundary of the median interval
ui = upper boundary of the median interval
Fi -1 = cumulative frequency of interval foregoing
median interval
fi = frequency of the median interval
41

• MODE
– Class interval that has the largest frequency value
will contain the mode.
– Mode is the class midpoint of this class.
– Mode must be determined from the histogram.

42

Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
mean for the sample calls hours fi xi
of the 48 hours: [2–under 5) 3 3,5
determine the class [5–under 8) 4 6,5
midpoints [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
n = 48 43


x
 fi xi Number of Number of
calls hours fi xi
 fi
[2–under 5) 3 3,5
597
 [5–under 8) 4 6,5
48 [8–under 11) 11 9,5
 12, 44 [11–under 14) 13 12,5
Average number [14–under 17) 9 15,5
of calls per hour [17–under 20) 6 18,5
is 12,44. [20–under 23) 2 21,5
n = 48 44

Exam question 3
The number of overtime hours worked by 40 part-time employees of a
security company in 1 week is shown in the following frequency
distribution:-
Hours per Frequency (f)
week
2.1 - < 2.8 12
2.8 - < 3.5 13
3.5 - < 4.2 7
4.2 - < 4.9 5
4.9 - < 5.6 2
5.6 - < 6.3 1

1. Estimate the mean number of overtime hours worked
2. What % of employees worked at least 4.2 hours overtime?
8 marks

Exam question 3
Procedure
1. Calculate the midpoint x
for each interval (lower
limit + upper limit/2)
2. Multiply f by the midpoint
x
3. Total the fx and f columns
4. Divide ∑fx by ∑f

Exam question 3
Hours per week Frequency (f) Mid point (x) fx

2.1 - < 2.8 12 (2.1 + 2.8)/2= 29.4
2.45
2.8 - < 3.5 13 3.15 40.95

3.5 - < 4.2 7 3.85 26.95

4.2 - < 4.9 5 4.55 22.75

4.9 - < 5.6 2 5.25 10.5

5.6 - < 6.3 1 5.95 5.95

40 136.5

Mean = 136.5/40 = 3.41hrs
Employees at least 4.2 hrs = 8 8/40 *100 = 20%

• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.

10% 90%
Min P10 Max

80% 20%
Min P80 Max

50% 50%
Min P50 = Q2 Max 49

2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

P20 is the  n  1    9  1    2
p
100
20
100
nd
value

P20 = −3
50

telephone calls received for two days at a municipal call
centre. The data was measured per hour.
Number of Number of
P60 calls hours fi F
= np/100 [2–under 5) 3 3
= 48(60)/100
[5–under 8) 4 7
= 28,8
[8–under 11) 11 18
The first cumulative
[11–under 14) 13 31
frequency ≥ 28,8
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48
n = 48 51

P60 Number of Number of
 u p  l p   100  Fp1 
np calls hours fi F
 lp 
fp [2–under 5) 3 3
 11 
14  11 28,8  18  [5–under 8) 4 7
 13, 49
13 [8–under 11) 11 18
[11–under 14) 13 31
60% of the time less [14–under 17) 9 40
than 13,49 or 40% of [17–under 20) 6 46
the time more than
13,49 calls per hour. [20–under 23) 2 48
n = 48 52

Exam question 3
1. John, one of the part-time workers was told he falls on the
70th percentile. Calculate the value and explain what it
means.
PROCEDURE
1. Calculate the cumulative frequencies
2. Calculate which class the required percentile falls into by
using P =np/100
3. Once you have identified the class use the percentile formula
given in the tables book to calculate the value. Take CARE to
order the calculation correctly.

4 MARKS

Exam question 3
P = np/100 = 40*70/100
Hours per Frequency Cumulative =28
week (f) F
2.1 - < 2.8 12 12 P70 = 3.5 + [ (4.2-3.5)(28-25)]/7

2.8 - < 3.5 13 25 = 3.5 + 0.8
3.5 - < 4.2 7 32
=3.8
4.2 - < 4.9 5 37

4.9 - < 5.6 2 39 70% of the workers worked fewer
hours overtime than John. 70% of
5.6 - < 6.3 1 40 the workers worked fewer than 3.8
hrs. 30% of the workers worked
40 more overtime hours than John. 30%
of the employees worked more than
3.8hrs.

Confidence interval
– An interval is calculated around the sample
statistic

Population parameter
included in interval

Confidence interval

56

Confidence interval
– An upper and lower limit within in which the
Example:
population parameter is expected to lie
Meaning of a 90% confidence interval:
– Limits will vary from sample to sample
– Specify the probability thatsamples taken from
90% of all possible the interval will
include the parameter produce an interval that will
population will
include the population parameter
– Typical used 90%, 95%, 99%
– Probability denoted by
• (1 – α) known as the level of confidence
• α is the significance level
57

• An interval estimate consists of a range of values
with an upper & lower limit
• The population parameter is expected to lie within
this interval with a certain level of confidence
• Limits of an interval vary from sample to sample
therefore we must also specify the probability that
an interval will contain the parameter
• Ideally probability should be as high as possible

58

SO REMEMBER
•We can choose the probability
•Probability is denoted by (1-α)
•Typical values are 0.9 (90%); 0.95 (95%) and 0.99 (99%)
•The probability is known as the LEVEL OF CONFIDENCE
•α is known as the SIGNIFICANCE LEVEL
•α corresponds to an area under a curve
•Since we take the confidence level into account when we
estimate an interval, the interval is called CONFIDENCE
INTERVAL

59

Confidence interval for Population Mean, n ≥ 30
- population need not be normally distributed
- sample will be approximately normal

  
CI (  )1   x  Z1   , if  is known
 2 n
 s 
CI (  )1   x  Z1   , if  is not known
 2 n
60

   Example :
CI (  )1   x  Z1   , if  is known
 2 n
90% confidence interval
 s 
CI (  )1   x  Z1   , if  is not known
 2 n 1 –   0,90
  0,10
1
90% of all sample
 0,10
means fall in this area   0, 05
2 2
These 2 areas added Confidence level
together = α i.e. 10% 1–α =1-α

1-α   0, 05

0, 05 
2
= 0,90 2
2

x
Lower conf limit Upper conf limit
61

• Confidence interval for Population Mean, n <
30
– For a small sample from a normal population and σ is
known, the normal distribution can be used.
– If σ is unknown we use s to estimate σ
– We need to replace the normal distribution with the t-
distribution
▬ standard normal
 s 
CI (  )1   x  tn 1;1  
▬ t-distribution
 2 n

63

• Example
– The manager of a small departmental store is concerned about
the decline of his weekly sales.
99% confident the mean weekly
– He calculated the average and standard deviation of his sales for
the past 12 weeks, x =sales will be between
R12400 and s = R1346
R11 193,14 and R13 606,86
– Estimate with 99% confidence the population mean sales of the
departmental store.
t11;0.995
 s   1346 
 x  tn 1;1    12400  3,106 
 2 n  12 
 12400  1206,86 
 11193,14 ; 13606,86 
65

• Confidence interval for Population proportion
– Each element in the population can be classified as a
success or failure

number of successes x
ˆ
Sample proportion p =
– Proportion always between 0 and 1 size =
sample n
– For large samples the sample proportion is
approximately normal ˆ
p

 p (1  p ) 
ˆ ˆ
CI ( p )1   p  z1 
ˆ 
 2 n  66

Exam question 7
1. In a sample of 200 residents of Johannesburg, 120 reported
they believed the property taxes were too high. Develop a
95% confidence interval for the proportion of the
residents who believe the tax rate is too high. Interpret your
answer
2. The time it takes a mechanic to tune an engine in a sample of
20 tune ups is known to be normally distributed with a
sample mean of 45 minutes and a sample standard deviation
of 14 minutes. Develop a 95% confidence interval estimate
for the mean time it will take the mechanic for all engine
tune ups. Interpret your answer

15 MARKS

Exam question 7
PROCEDURE
1. Determine what measure your are looking at: mean,
proportion or standard deviation
2. Select appropriate formula based on 1. and sample size (t for
small sample sizes <30; z for larger sample sizes)
3. Put the numbers into the formula and calculate the
confidence intervals

Exam question 7
1.
ˆ
Sample proportion p =
number of successes
=
x In a sample of 200 residents of
sample size n Johannesburg, 120 reported
they believed the property
 p (1  p ) 
ˆ ˆ taxes were too high. Develop a
CI ( p )1   p  z1 
ˆ 
 2 n  95% confidence interval for
𝑝 = 120/200 = 0.6 the proportion of the
Z 1-α = 1.96 residents who believe the tax
2 rate is too high. Interpret your
CI = 0.6 +/_1.96 √( 0.6 0.4 )/200
answer
CI = 0.6 +/- 0.07
0.53<CI<0.67
At CL of 95% between 53% and 67% of
residents believe tax rate is too high

Exam question 7
The time it takes a mechanic
 s 
CI (  )1   x  t n 1;1   to tune an engine in a
 2 n  sample of 20 tune ups is
known to be normally
14
= 45 +/- 2.093 √20 distributed with a sample
mean of 45 minutes and a
sample standard deviation
= 45 +/- 6.55 of 14 minutes. Develop a
95% confidence interval
38.45< µ < 51.55 estimate for the mean time
At a confidence level of 95% the
it will take the mechanic for
population average time to complete a all engine tune ups.
tune up is between 38.45 and 51.55 Interpret your answer
minutes

STEPS OF A HYPOTHESIS TEST

Step 1 • State the null and alternative hypotheses

Step 2 • State the values of α

Step 3 • Calculate the value of the test statistic

Step 4 • Determine the critical value

Step 5 • Make a decision using decision rule or graph

Step 6 • Draw a conclusion

72

• Hypothesis test for Population Mean, n < 30
– If σ is unknown we use s to estimate σ
– We need to replace the normal distribution with
the t-distribution with (n - 1) degrees of
freedom

Testing H0: μ = μ0 for n < 30
Alternative Decision rule:
Test statistic
hypothesis Reject H0 if
H1: μ ≠ μ0 |t| ≥ tn - 1;1- α/2 x  0
t
H1: μ > μ0 t ≥ tn-1;1- α  s 
 
 n
H1: μ < μ0 t ≤ -tn-1;1- α 73

• Hypothesis testing for Population proportion
number of successes x
– Sample proportion p =
ˆ =
sample size n

– Proportion always between 0 and 1
Testing H0: p = p0 for n ≥ 30
Alternative Decision rule:
Test statistic
hypothesis Reject H0 if
H1: p ≠ p0 |z| ≥ Z1- α/2 p  p0
ˆ
z
H1: p > p0 z ≥ Z1- α p0 (1  p0 )
H1: p < p0 z ≤ -Z1- α n 74

Exam question 8
1. Oliver Tambo airport wants to test the claim that on
average cars remain in the short term car park area longer
than 42.5 minutes. The research team drew a random
sample of 24 cars and found that the average time that
these cars remained in the short term parking area was 40
minutes with a sample standard deviation of 2 minutes.
Test the claim at 10% level of significance and interpret.

2. The Gautrain Authority add a bus route if more than 55%
of commuters indicate they would use the route. A
sample of 70 commuters revealed that 42 would use a
route from Sandton to Auckland Park. Does this route
meet the Gautrain criteria. Use 0.05 significance level

16 MARKS

Exam question 8
Procedure
1. State H0 and Ha
2. Determine the critical value from the
appropriate test table using α, and n
3. Compute test statistic (t or z value??)
4. Draw conclusion

Exam question 8
State hypothesis Oliver Tambo airport wants
H0: µ = 42.5 to test the claim that on
Ha: µ > 42.5 average cars remain in the
Determine critical value short term car park area
tn-1; 1- α = t 23; 0.9 = 1.319 longer than 42.5 minutes.
Reject H0 if the test statistic is > The research team drew a
1.319 random sample of 24 cars
Calculate test statistic and found that the average
x  0 time that these cars
t
 s 
  remained in the short term
 n
parking area was 40 minutes
T= 40-42.5 = -6.12 with a sample standard
2 deviation of 2 minutes. Test
√24 the claim at 10% level of
Do not reject H0
significance and interpret.

Exam question 8
State hypothesis The Gautrain Authority
H0: p = 0.55 add a bus route if more
Ha: p > 0.55 than 55% of commuters
Determine critical value indicate they would use
α = 0.05 Z = 1.64 the route. A sample of 70
Reject H0 if Z test > 1.64 commuters revealed that
Calculate test statistic 42 would use a route from
number of successes x Sandton to Auckland Park.
ˆ
Sample proportion p = =
sample size n Does this route meet the
p  p0
ˆ
z Gautrain criteria. Use 0.05
p0 (1  p0 )
n
significance level
0.6−0.55
Z= = 0.84
√((0.55)(0.45)/70

Do not reject H0

Coefficient of correlation
• The coefficient of correlation is used to measure the
strength of association between two variables.
• The coefficient values range between -1 and 1.
– If r = -1 (negative association) or r = +1 (positive
association) every point falls on the regression
line.
– If r = 0 there is no linear pattern.
• The coefficient can be used to test for linear
relationship between two variables.
80

Perfect positive High positive Low positive
r = +1 r = +0,9 r = +0,3
Y Y Y

X X X

Perfect negative High negative No Correlation
r = -1 r = -0,8 r=0
Y Y Y

X X X

81

Exam question 10
The cost of repairing cars that were involved in accidents is one reason
that insurance premiums are so high. In an experiment 5 cars were
driven into a wall. The speeds were varied between 20km/hr and
80km/hr (X). The costs of repair (Y) were estimated and listed below:-
SPEED (Km/h) (X) COST OF REPAIR (R’000)
(Y)

20 3
30 5
40 8
60 24
80 34

1. Use calculator to calculate coefficient of correlation. Interpret your
answer
2. Calculate and interpret the coefficient of determination for this
data
3. Use your calculator to construct regression line equation and
predict repair cost at 50km/h

10 MARKS

Exam question 10
1. Put data into calculator
2. Select regression function and select r
3. Calculate coefficient of determination
= r2 x100%
4. Interpret results
5. Using Y = A + BX select regression function on
calculator and determine values for A & B
6. Put x = 50 into formula and calculate result

Exam question 10
1. r = 0.98
There is a very strong relationship between the
repair cost and speed.
2. r2 x 100% = 0.982 x 100 = 96%
96% of the variation in the cost of repair is
explained by the variation in the speed at which
the car crashed
3. Y = -10.7 +0.55x
X = 50 Y = 16.8

Revision workshop 17 january 2013

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (6)

Similar a Revision workshop 17 january 2013

Similar a Revision workshop 17 january 2013 (17)

Más de jillmitchell8778

Más de jillmitchell8778 (20)

Último

Último (20)

Revision workshop 17 january 2013