Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Descriptive Statistics
1. Descriptive Statistics
Krupnik Estate Agents
Anthony J. Evans
Professor of Economics, ESCP Europe
www.anthonyjevans.com
(cc) Anthony J. Evans 2019 | http://creativecommons.org/licenses/by-nc-sa/3.0/
3. Weekly prices of studio apartments in West Hampstead (2)
• The first task is to order the data set
3
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
4. Foundations of descriptive statistics
Measures of data
Location
Dispersion
Description
• A “typical” or average value
• Used to summarise the distribution
• The spread or variability of the data
• Appreciate the differences in the data
4
5. Measures of Location: Mean
• The mean of a data set is the summation of all individual
values, divided by the number of observations
SamplePopulation
5Notice the use of Greek/upper case for populations and Latin/lower case for samples
x =
xi∑
n
=
34,356
70
= 490.80
€
µ =
x∑ i
N
€
x =
x∑ i
n
6. Measures of Location: Median
• The median of a data set is the value that divides the lower half of
the distribution from the higher half
• The median is the middle observation
– i.e. the (n+1)/2th observation
– In this case, 71/2 = 35.5th observation
• If there are an even number of observations, take the mean of
both middle values
• Median = 475
6
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
7. Measures of Location: Mode
• The mode of a data set is the value that occurs with the
greatest frequency.
• If the data have exactly two modes, the data are bimodal
• If the data have more than two modes, the data are
multimodal
• Mode = 450
7
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
9. Measures of Location: Percentiles
• The median is also known as the 50th percentile
• The p th percentile of a data set is a value such
that p percent of the items take on this value or
less, and (100 - p) percent of the items take on this
value or more
– Arrange the data of ‘n’ items in ascending order
– Compute index i, the position of the pth
percentile
– If i is not an integer, round up. The pth
percentile is the value in the ith position.
– If i is an integer, the pth percentile is the
average of the values in positions i and i+1
• I is the position of the p percentile
9
€
i =
p
100
"
#
$
%
&
'n
12. GDP of UN member countries
12
$267,015
$18,171
0
50,000
100,000
150,000
200,000
250,000
300,000
Mean Median
GDP
On average, how many legs do English people have?
13. Measures of Dispersion
• in choosing supplier A or supplier B we should consider not
only the average delivery time for each, but also the
variability in delivery time for each
A
Mean = 0
B
Mean = 0
Frequency
13
14. Measures of Dispersion: Range
• The range of a data set is the difference between the
largest and smallest data values
• Range = largest value - smallest value
• Range = 615 - 425 = 190
14
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
15. Measures of Dispersion: Interquartile range
• The interquartile range of a data set is the difference
between the first and third quartiles
• Q1 = 25th percentile = 445 (from before)
• Q3 = 75th percentile = 525
• Interquartile range = 525 - 445 = 80
15
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
€
i =
p
100
"
#
$
%
&
'n
16. Measures of Dispersion: Variance
• The variance is a measure of variability that utilizes all the
data.
• It is based on the difference between the value of each
observation (xi) and the mean (x BAR for a sample, and m
for a population)
• The variance is the average of the squared differences
between each data value and the mean.
s
2
=
(xi−x )2
∑
n−1
σ2
=
(xi −µ)2
∑
N
16
SamplePopulation
17. Measures of Dispersion: Standard Deviation
• The standard deviation of a data set is the square root of
the variance.
• It is more easily comparable to the mean than the variance
– standard deviation measures the spread about the mean
using the original (not squared) scale
• It ties into the Normal Distribution
€
s = s2
€
σ = σ2
Sample Population
17
s =
(xi −x )2
∑
n−1
19. Measures of Dispersion: Examples
• Variance
• Standard Deviation
s2
=
(xi −x )2
∑
n−1
=2,996
74.5429962
=== ss
19
20. Measures of Dispersion: z Score
• The z - score is the standardised value
• It denotes the number of standard deviations a
data value xi is from the mean
• A data value less than the sample mean will
have a z-score less than zero
• A data value greater than the sample mean
will have a z -score greater than zero
€
zi =
xi − x
s
20
22. Measures of Dispersion: Outliers
• An outlier is an unusually small or unusually large value in
a data set
– It might be an incorrectly recorded data value
– It might be a data value that was incorrectly included
in the data set
– It might be a correctly recorded data value that
belongs in the data set!
• A data value with a z-score less than -3 or greater than +3
might be considered an outlier
22
23. Summary
• There are two main ways to get feel for a set of numbers (a
distribution) – location and dispersion
• The mean and the standard deviation are the most
frequent measures of location and dispersion but it’s
important to understand the alternatives
23
25. Measures of Location: Mean
• The mean of a data set is the summation of all individual
values, divided by the number of observations
SamplePopulation
25Notice the use of Greek/upper case for populations and Latin/lower case for samples
x =
xi∑
n
=
34,356
70
= 490.80
€
µ =
x∑ i
N
€
x =
x∑ i
n
26. Measures of Location: Median
• The median of a data set is the value that divides the lower half of
the distribution from the higher half
• The median is the middle observation
– i.e. the (n+1)/2th observation
– In this case, 71/2 = 35.5th observation
• If there are an even number of observations, take the mean of
both middle values
• Median = 475
26
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
27. Measures of Location: Mode
• The mode of a data set is the value that occurs with the
greatest frequency.
• If the data have exactly two modes, the data are bimodal
• If the data have more than two modes, the data are
multimodal
• Mode = 450
27
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615