Python Notes for mca i year students osmania university.docx
Data Analysis
1. Data Analysis
Prof. (Dr.) Smriti Arora
Amity College of Nursing
Amity University Haryana
smritiamit@msn.com, sarora1@ggn.amity.edu
2. Introduction
• Process of converting raw data into meaningful information.
• Based on objectives
• Arranged according to objectives in Sections
• Descriptive and Inferential
• Specify the level of significance
• State null hypothesis (HO)
• Specify how analysis is done- manually or through some software-
SPSS/Strata
• Results must be stated as accepting or rejecting the null hypothesis
3. How to decide what test to use ?
• What type of analysis ? Univariate, Bivariate, multivariate
• How we have measured the variables? Categorical or continuous
• Data is normal or non normal ?
• How many categories of variables ? 2 or >2
• Whether the groups are related or not / within or between
comparison ?
4. Data Analysis
(For any data from any specialty / discipline)
• Uni-variate (one variable at a time)
• Bi-variate (two variables at a time)
• Multi-variate (more than two variables at a time)
May decide to perform one or all of the above depending
on the need
There is no other way of data analysis
6. Univariate Analysis
Categorical variable Quantitative variable
Measures of Central
Tendency
Measures of Locations Measures of Variation
Proportion or
percentages
Rate
Prevalence
Incidence
Mean
Median
Mode
Quartiles (Q1, Q2, Q3)
Deciles (D1, D2, ----, D5, --
--, D9)
Percentiles (P1, P2, ----,
P50, ----, P99)
Range
Quartile Deviation
Mean Deviation
Standard Deviation
Coefficient of Variation
7.
8. Normal distributions with larger
standard deviations are more “spread
out,” while normal distributions with
smaller standard deviations are more
“compact.”
10. • Percentiles are defined as the values that divide the whole series into
100 equal parts.
• So, there are 99 quartiles namely first percentile denoted by P1,
second percentile denoted by P2 …... and 99th percentile denoted by
P99.
• 50th percentile is Median.
• Since it denotes the position of the item in the series, it is a positional
average.
11. Levels of measurement
Nominal level Ordinal scale Interval scale Ratio scale
The nominal type
differentiates
between items or
subjects based only
on their names.
gender,
nationality,
ethnicity,
blood group
It allows for rank order
(1st, 2nd, 3rd, etc.) by
which data can be sorted
such as 'completely
agree', 'mostly agree',
'mostly disagree',
'completely disagree'
when measuring opinion.
Pain
Anxiety
Depression
stress
It allows for the degree of difference
between items, but not the ratio
between them.
Examples include temperature with
the Celsius scale, which has two
defined points (the freezing and
boiling point of water at specific
conditions) and then separated into
100 intervals.
Ratios are not allowed since 20 °C
cannot be said to be "twice as hot"
as 10 °C, nor can
multiplication/division be carried
out
It is the estimation of the ratio
between a magnitude of a
continuous quantity and a unit
magnitude of the same kind
A ratio scale possesses a
meaningful (unique and non-
arbitrary) zero value.
height, weight, KAP scores, BMI.
Ratios are allowed because
having a non-arbitrary zero point
makes it meaningful to say, for
example, that one object has
"twice the length" of another.
14. Assessing normality
• Analyze, Descriptive statistics, Explore
• Plots , normality tests, Histogram
• Kolmogorov Smirnoff test, Shapiro wilk
test
• P value should be above 0.05
• https://www.youtube.com/watch?v=2
GRZ_d4ftoo
15. Inferential Statistics
Parametric tests Nonparametric tests
• Used when data is- normal
• Includes comparisons of means
• Used for continuous data
• Probability sampling
• Parametric tests usually have more
statistical power than
nonparametric tests.
• T test, ANOVA, regression analysis,
Correlation Coefficient
• Non normal
• Comparison of Medians
• Non probability sampling
• Used for categorical
(nominal, ordinal or discrete)
data
• Kruskal Wallis, Mc, Nemar,
Man Whitney U, Friedman
test
16. Hypothesis testing procedure
• State null hypothesis
• Determine level of significance: 0.05 or 0.01
• Select the test statistic
• Compute the test statistic
• Calculate degrees of freedom
• Compare test value with tabled value.
17.
18.
19.
20. Constructing Bivariate Tables
• DV goes in the rows, IV goes in the columns
Music therapy
given
Music therapy not
given
Pain present
Pain absent
21.
22.
23.
24.
25.
26.
27.
28. • Type 1 error: rejecting a null hypothesis when it is true, false positive
• Type 2 error: accepting a null hypothesis when it is false , false negative