Graphical presentation of data

What is a variable?

In statistics, a variable has two defining characteristics:

A variable is an attribute that describes a person, place, thing, or idea.

The value of the variable can "vary" from one entity to another.

For example, a person's hair color is a potential variable, which could
have the value of "blond" for one person and "brunette" for another.

Qualitative vs. Quantitative Variables

Variables can be classified
as qualitative (aka,
categorical – Age, likert
scale, race) or quantitative
(aka, numeric).

Examples of types of data

Quantitative

Continuous Discrete

Blood pressure, height, weight, Number of children, Number of
age attacks of asthma per week

Categorical

Ordinal (Ordered categories) Nominal (Unordered categories)

Grade of breast cancer Sex (male/female)
Better, same, worse Alive or dead
Disagree, neutral, agree Blood group O, A, B, AB

Graphical presentation of data is

• better understood and appreciated by humans.
• brings out the hidden pattern and trends of the complex data
sets.

Thus the reason for displaying data graphically is
two fold:
• Investigators can have a better look at the information
collected and the distribution of data
• To communicate this information to others quickly We shall
discuss in detail some of the commonly used graphical
presentations.

Bar Charts : Bar charts are
used for qualitative type of
variable

Here the variable studied is
plotted in the form of bar
along the X-axis (horizontal)
and the height of the bar is
equal to the percentage or
frequencies which are plotted
along the Y-axis (vertical).

Pie Chart

Another interesting
method of displaying
categorical (qualitative)
data is a pie diagram also
called as circular diagram.

X/100*360

Pie Chart

A pie diagram is best when
the total categories are
between 2 to 6.

If there are more than 6
categories, try and reduce
them by “clubbing”,
otherwise the diagram
becomes too overcrowded.

Stem-and-leaf plots
This presentation is used for quantitative type of
data.

To construct a stem-and-leaf plot, we divide each
value into a stem component and leaf component.

The digits in the tens-place becomes stem
component and the digits in units place becomes
leaf components.

It is of much utility in quickly assessing whether
the data is following a “normal” distribution or
not, by seeing whether the stem and leaf is
showing a bell shape or not.

For example consider a sample of 10 values of age
in years : 21, 42, 05, 11, 30, 50, 28, 27, 24, 52.

Histogram
A histogram is used for
quantitative continuous type
of data where, on the X-axis,
we plot the quantitative
exclusive type of class intervals
and on the Y-axis we plot the
frequencies.

The difference between bar
charts and histogram is that
since histogram is the best
representation for quantitative
data measured on continuous
scale, there are no gaps
between the bars.

Box-and-Whisker plot
A box-and-whisker plot reveals
maximum of the information to the
audience.

A box-and whisker plot can be useful
for handling many data values.

They allow people to explore data and
to draw informal conclusions when
two or more variables are present.

It shows only certain statistics rather
than all the data.

Box-and-Whisker plot
Five-number summary is another name for the
visual representations of the box and whisker
plot.

Maximum
The five-number summary consists of the Q3
median, the quartiles (lower quartile and upper
quartile), and the smallest and greatest values Range IQR Median

in the distribution.
Q1

Minimum

Thus a box-and-whisker plot displays the

• center,
• the spread,
• overall range of distribution

Scatter Diagram
A scatter diagram gives a quick visual
display of the association between two
variables, both of which are measured on
numerical continuous or numerical
discrete scale. (Both quantitative)

Figure shows instant finding that weight
and age are associated - as age increases,
weight increases.

Be careful to record the dependent
variable along the vertical (Y) axis and the
independent variable along the
horizontal (X) axis.

Scatter Diagram
In this example weight is
dependent on age (as age
increases weight is likely to
increase) but age is not dependent
on weight (if weight increases, age
will not necessarily increase).

Thus, weight is the dependent
variable, and has been plotted on Y
axis while age is the independent
variable, plotted along X axis.

Correlation coefficient

The degree of association is measured by
a correlation coefficient, denoted by r.

It is sometimes called Pearson's
correlation coefficient after its originator
and is a measure of linear association.

Correlation coefficient
The correlation coefficient is measured on a
scale that varies from + 1 through 0 to - 1.

Complete correlation between two variables is
expressed by either + 1 or -1.
• When one variable increases as the other increases the
correlation is positive; (coffee v/s wakefulness)
• when one decreases as the other increases it is negative.
(Old is gold!)
• Complete absence of correlation is represented by 0.

A perfect correlation of ± 1
occurs only when the data
points all lie exactly on a
straight line.

A correlation greater than
0.8 would be described as
strong, whereas a correlation
less than 0.5 would be
described as weak.

Correlation coefficient v/s Regression
analysis
Regression is used
When the objective is to extensively in making
determine association or the predictions based on
strength of relationship between
two such variables, we use finding unknown Y values
correlation coefficient (r). from known X values.

If the objective is to quantify and Multiple Regression is the
describe the existing relationship same as regression except
with a view of prediction, we use that it attempts to predict Y
regression analysis. from two or more
independent X variables.

Summarising the Data:
Measures of Central
Tendency and
Variability

Measures of Central Tendency

This gives the centrality measure of the data set i.e. where the observations are
concentrated. There are numerous measures of central tendency. These are : Mean;
Median; Mode; Geometric Mean; Harmonic Mean.

Mean (Arithmetic Mean) or Average

It is calculated as follows.
This is most appropriate measure for
data following normal distribution. It
is calculated by summing all the
observations and then dividing by
number of observations. It is
generally denoted by x.

Mean (Arithmetic Mean) or Average
It is the simplest of
the centrality
It depends on all
measure but is
values of the data
influenced by
set but is affected
extreme values and
by the fluctuations
hence at times may
of sampling
give fallacious
results.

Example : The serum cholesterol level (mg/dl) of 10 subjects
were found to be as follows:

192 242 203 212 175 284 256 218 182 228

Median
.

When the data is skewed, another measure of central tendency called
median is used.

Median is a locative measure which is the middle most observation
after all the values are arranged in ascending or descending order.

In case when there is odd number of observations we have a single
most middle value which is the median value.

In case when even number of observations is present there are two
middle values and the median is calculated by taking the mean of
these two middle observations

It is less affected by fluctuations of sampling than mean.

Mode
Though mode is easy to
calculate, at times it may be
Mode is the most common
impossible to calculate
value that repeats itself in
mode if we do not have any
the data set.
value repeating itself in the
data set.

At other end it may so
happen that we come In such cases the
across two or more values distribution are said to
repeating themselves same bimodal or multimodal.
number of times.

Measures of Relative Position
(Quantiles)
Quantiles are the values that divide a set numerical data arranged in
increasing order into equal number of parts.

Quartiles divide the numerical data arranged in increasing order into four
equal parts of 25% each.
• Thus there are 3 quartiles Q1, Q2 and Q3 respectively.

Deciles are values which divide the arranged data into ten equal parts of 10%
each.
• Thus we have 9 deciles which divide the data in ten equal parts.

Percentiles are the values that divide the arranged data into hundred equal
parts of 1% each.
• Thus there are 99 percentiles.
• Q) Median = ___ percentile, ____ decile and ____quartile.

Answer

The 50th percentile, 5 th

decile and 2 nd quartile

are equal to median.

Measures of Variability

In contrast to measures of central
tendency which describes the
center of the data set, measures of
variability describes the variability
or spreadness of the observation
from the center of the data.

Measures of Variability

Various measures of dispersion
are as follows.
• Range
• Interquartile range
• Mean deviation
• Standard deviation
• Coefficient of variation

Range
One of the simplest measures
of variability is range. Range is
the difference between the two Range = maximum observation
extremes i.e. the difference – minimum observation
between the maximum and
minimum observation.

Drawback of range is that it
It gives rough idea of the
uses only extreme observations
dispersion of the data.
and ignores the rest.

Interquartile Range

As in the case of range difference in extreme
observations is found, similarly interquartile
range is calculated by taking difference in the
values of the two extreme quartiles.

Interquartile range = Q3 - Q1

Coefficient of Variation
• measures variability in relation to
Besides the measures the mean (or average) and is used
of variability discussed to compare the relative dispersion
above, we have one in one type of data with the
relative dispersion in another type
more important of data.
measure called the • The data to be compared may be
coefficient of variation in the same units, in different
which compares the units, with the same mean, or with
different means.
variability in two data
sets.

Graphical presentation of data

Graphical presentation of data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Graphical presentation of data

Similar a Graphical presentation of data (20)

Último

Último (20)

Graphical presentation of data