Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.
2. 2
Definition
•Descriptive statistics are used to describe the basic features of the data in a study.
They provide simple summaries about the sample and the measures. Together with
simple graphics analysis, they form the basis of virtually every quantitative
analysis of data.
•Descriptive statistics are brief descriptive coefficients that summarize a given
data set, which can be either a representation of the entire population or a sample
of it. Descriptive statistics are broken down into measures of central tendency and
measures of variability, or spread.
•Measures of variability or spread include the standard deviation (or variance), the
minimum and maximum values of the variables, kurtosis and skewness.
•Descriptive statistics are either quantitative (summary statistics) or visual (simple
graphs)
•Descriptive statistics are limited in so much that they only allow you to make
summations about the people or objects that you have actually measured. You
cannot use the data you have collected to generalize to other people or objects
(i.e., using data from a sample to infer the properties/parameters of a population).
3. Use in Statistical analysis
Univariate analysis
• It describes the distribution of a single variable.
• It includes central tendency (mean, median and mode), dispersion (range and quantiles) and
spread (variance and standard deviation).
• Distribution is also studied using skewness and kurtosis. They can be graphically represented by
histograms.
Bi- and multivariate
• Bivariate analysis is the simultaneous analysis of two variables (attributes)
• Explores the concept of relationship between two variables, whether there exists an association
and the strength of this association, or whether there are differences between two variables and
the significance of these differences.
3
4. Maximum and Minimum
• Minimum is the smallest value in the data set. This number is the data value that is less than or
equal to all other values in our set of data
• Maximum is the largest value in the dataset. This number is the data value that is greater than or
equal to all other values in our set of data
• The maximum and minimum provide good examples of the type of descriptive statistic that is easy
to marginalize. Despite these two numbers being extremely easy to determine, they make
appearances in the calculation of other descriptive statistics
Uses:
• Both maximum and minimum is used to calculate the range
4
5. Mean
• Mean can’t consider when there is a huge
eg:- Mean for the salaried employees across
all position in the organization
• Mean can’t be used in categorical data
• Mean" is the "average" , where we add up
all the numbers and then divide by the
number of numbers
• Mean will say the average value of the
particular variable
Applicable variable type :
Interval and Ratio level data
5
6. Median
• The Median of the Dataset is dependent on whether the number of
elements in the dataset is odd or even
• If there is even number of of dataset, add the Centre two values and
divide by two
Applicable variable type :
Ordinal and Interval level data
6
7. Mode
• The “Mode” for a dataset is the element that occurs the most often
• When we have huge difference in datasets this Mode measure is used
, and used for the Categorical data
Applicable variable type :
Nominal, Ordinal and Interval level data
7
8. Range
• The Range is the difference between the lowest and highest
values
• The range can sometimes be misleading when there are
extremely high or low values
• Range is used to find the Maximum and minimum value in the
Dataset
8
9. Quartiles
Definition:
Quartiles are measures of central tendency that divide
a group of data into four subgroups or parts. The three
quartiles are denoted as Q1, Q2, and Q3.
Explanation:
• The first quartile, Q1, separates the first, or lowest,
one-fourth of the data from the upper three-fourths
and is equal to the 25th percentile.
• The second quartile, Q2, separates the second
quarter of the data from the third quarter. Q2 is
located at the 50th percentile and equals the median
of the data.
• The third quartile, Q3, divides the first three-
quarters of the data from the last quarter and is
equal to the value of the 75th percentile
Applicable variable type :
Ordinal level data
9
10. Skewness
• Skewness is a measure of symmetry. If the
skewness of S is zero then the distribution
represented by S is perfectly symmetric. If the
skewness is negative, then the distribution is
skewed to the left, while if the skew is positive
then the distribution is skewed to the right
• Skewness tells us about the direction of variation
of the data set
• Skewness is a measure that studies the degree and
direction of departure from symmetry
Interpretation:
If skewness is equal to zero distribution is normal
If skewness is greater than zero it’s Positive
skewness
If skewness is less than zero it’s Negative skewness
10
11. Kurtosis
• Kurtosis is a statistical measure that's used to describe the
distribution, or skewness, of observed data around the mean,
sometimes referred to as the volatility of volatility
• Kurtosis is used generally in the statistical field to describes trends
in charts. Kurtosis can be present in a chart with fat tails and a low,
even distribution, as well as be present in a chart with skinny tails
and a distribution concentrated toward the mean
• Kurtosis is one or more symmetrical distributions are
compared, the difference in them are studied with ‘Kurtosis’
11