The presentation gives an introduction to statistics and tries to show the importance of statistics for planners. It talks about the various ways in which the data is categorized and also explains on how to select the chart type to be used depending on what kind of information you want to present.
3.
Statistics is the science of
collecting,
classifying,
presenting, and
interpreting
numerical data
What is Statistics?
4.
The numerical information that planners collect is
meaningless if it cannot be presented in a proper and
easily accessible way.
Planners need to learn Statistics-
To analyse the present situation
Plan for the people and there needs
To Project the future demand
Why Planners need to learn
statistics?
5.
Primary Data
Data collected by the researcher/ investigator
himself/ herself for a specific purpose.
Secondary Data
Data collected by someone else for some other
purpose
(Data not collected/surveyed by the investigator but
being used by the investigator).
Data categorisation - Based on
collection mode
6. Discrete data can only take particular values. There may
potentially be an infinite number of those values, but each
is distinct and there's no grey area in between/ data that
has distinguishable spaces between values.
Discrete data can be Counted.
Discrete data can be numeric -- like number of people -- but
it can also be categorical -- like male or female.
Eg: Population of village – 9,515
Number of languages xyz can speak - 4
Another way to categorise data-
Discrete & Continuous Data
Source: Pitman, E. J. G. 1979. Some basic theory for statistical inference.
7. Continuous data are not restricted to defined
separate values, but can occupy any value over a
continuous range. Between any two continuous data
values there may be an infinite number of others.
Continuous data can be Measured. It is the data that
can be measured as finely as is practical (no spaces
between values).
Eg-
Age of the people in xyz village- 1day, 15 years 2months, 70 years
Length of the tree
Speed of the train
Temperature
8.
Yet another way to categorise data-
Levels of Measurement
Qualitative
Nominal
Lowest level of measurement
No natural order
Ordinal
Ordered categories
Relative ranking
Unknown distance between
rankings
Quantitative
It includes things that can be
measured
Interval
Ordered categories
Equal distance between
values
Ratio
Most precise
Ordered
Exact Values, Equal intervals
Have a zero point
9. Nominal scales are used for labeling variables, without any
quantitative value. “Nominal” scales could simply be called
“labels.”
Which City do you live in?
A - Delhi
B - Mumbai
C - Kolkata
D – Chennai
Where do you live?
A – North of the equator
B - South of the equator
Nominal
What is your gender?
A- Male
B- Female
10. It is the order of the values, that is what’s important and
significant, but the differences between each one is not really
known.
Ordinal scales are typically measures of non-numeric concepts
like satisfaction, happiness, discomfort, etc.
Example:
How satisfied are you with the public transport system of your
city?
1 – Very Unsatisfied
2- Somewhat Unsatisfied
3- Neutral
4- Somewhat Satisfied
5- Very Satisfied
Ordinal
11.
Interval scales are numeric scales in which we know not
only the order, but also the exact differences between
the values.
Example:
The difference between 30 and 40 degrees is a
measurable 10 degrees, as is the difference between
20 and 30 degrees.
Your test scores
Interval
12. Ratio is used to relate/compare two quantities.
Two quantities can be compared only if they are in the
same unit.
Ratio is dimension less. i.e. we don’t put any unit to it.
Example 1 : Population of Female
Population of Male
Sex Ratio of India 2011- 940 females for every 1,000 males
Ratio
13. Example 2:
The height to width ratio of the Indian Flag is 2:3.
So for every 2 (inches, meters, whatever) of
height there should be 3 of width.
If we made the flag 20 inches high, it should be 30
inches wide.
If we made the flag 40 cm high, it should be 60 cm
wide (which is still in the ratio 2:3)
Example 3:
Total strength of your class-30
Number of girls- 6
Number of boys-24
Ratio= 6/24 = 1/ 4
The ratio of girls : boys enrolled for M. Plan course 1: 4
14.
Levels of Measurement
Levels of Measurement Characteristics. The Measurement Values can
be…
Nominal Scale Distinguished
Ordinal Scale Distinguished and ranked
Interval Scale Distinguished, ranked and measured with
constant units of measurement
Ratio Scale Distinguished, ranked, measured with
constant units of measurement and have a zero
point
15. A proportion refers to the fraction of the total that
possesses a certain attribute.
Proportion is the decimal form of a percentage, so 100%
would be a proportion of 1.000; 50% would be a
proportion of 0.500, etc.
Example:
In a recent poll of 200 households, it was found that 152
households had at least one computer. Estimate the proportion
of households in the population that have at least one
computer.
p= 152/200 = 0.76
PROPORTION
16.
Choosing Chart Types
Charts speak directly to the eye and are very effective in creating a picture in
the reader’s mind. Therefore the person who creates the chart has a special
responsibility.
Good charts convey information, bad charts convey disinformation. You have
to create your chart in such a way that you don’t mislead the reader.
-Anders Wallgren,
Author of Graphing Statistics & Data, Creating Better Charts
17. A column chart displays data as vertical bars.
It is used when we want to illustrate variable values which
are distinct, i.e. with qualitative or discrete variables. Bar
charts use spaces between the bars to emphasise this.
Column Chart/Bar Chart
18.
19. The Horizontal bar chart is exactly the same as a
column chart only the x-axis and y-axis are switched.
Horizontal bar charts are to be preferred to vertical in
two situations-
Variable values with long names /data
Many Variable values
Horizontal Bar Chart
20. Variable values with long names/ data
Proportion of people with higher education according to parents
occupational group 1987
23.
A dot chart or dot plot is a statistical chart consisting of
data points plotted on a fairly simple scale, typically
using filled in circles. A dot plot is similar to a bar
graph because the height of each “bar” of dots is equal
to the number of items in a particular category.
They are useful for highlighting clusters and gaps, as
well as outliers.
Dot chart
24.
25. We use histograms in the same situations where we use bar
charts, i.e. to show quantities. However, the variable to be
illustrated is continuous. The freestanding bars are replaced with
areas which are placed right next to one another.
Histogram
26.
27.
A type of graph in which a circle is divided into sectors
that each represent a proportion of the whole.
Pie Chart
28.
29.
Line charts are normally used
for describing developments
over time.
Since time is continuous the
different values are joined by
lines. Line charts can also be
an alternative to histograms.
Line Chart
Births and deaths in Sweden 1940-90
30. A scatterplot is used to graphically represent the
relationship between two variables.
Each scatterplot has a horizontal axis (x-axis) and a vertical
axis (y-axis). One variable is plotted on each axis.
A scatter chart draw a single point for each point of data in
a series without connecting them.
Example of Variables that can be plotted
Scatter chart
31.
Positive Correlation:
Both variables move in the same
direction. In other words, as one
variable increases, the other variable
also increases. As one variable
decreases, the other variable also
decreases.
i.e., years of education and yearly
salary are positively correlated.
Types of Correlation
32.
Negative correlation:
The variables move in opposite
directions. As one variable increases,
the other variable decreases. As one
variable decreases, the other variable
increases.
i.e., hours spent sleeping and hours
spent awake are negatively
correlated.
33.
No Correlations:
It means that there is no apparent
relationship between the two variables. For
example, there is no correlation between
shoe size and salary.
34.
35.
The Pareto principle (also known as the 80–20 rule) states that,
for many events, roughly 80% of the effects come from 20% of
the causes.
A management consultant Joseph M. Juran suggested the
principle and named it after Italian economist Vilfredo Pareto,
who showed that approximately 80% of the land in Italy was
owned by 20% of the population. He then carried out surveys
on a variety of other countries and found a similar distribution.
The purpose of the Pareto chart is to highlight the most
important among a (typically large) set of factors/ Needed to
identify predominant causes of a problem
Pareto Chart/Diagram
36. A Pareto chart is a type of chart that contains both bars and
a line graph, where individual values are represented in
descending order by bars, and the cumulative total is
represented by the line.
Other Examples
In 1992 United Nations Development Program Report,
showed that distribution of global income is very uneven,
with the richest 20% of the world's population controlling
82.7% of the world's income.
Microsoft noted that by fixing the top 20% of the most-
reported bugs, 80% of the related errors and crashes in a
given system would be eliminated
80% of the traffic occurs during 20% of the time