1. Discrete Data Mapping : Problem
of HR-Analytics
Debdulal Dutta Roy, Ph.D. (Psy.)
Psychology Research Unit
INDIAN STATISTICAL INSTITUTE, KOLKATA
Workshop : QIP-
STC (AICTE) on HR Analytics- hands on Training.
VGSOM, IIT., Kharagpur
11.5.2015
2. HR analytics and Discrete data
• HR-analytics cover two approaches broadly - association and
predictive. Discrete data mapping follows former. It is a
multivariate statistical model to explore association of different
data points. Association of discrete data forms neighbourhood. The
map provides knowledge about distances among neighbourhoods,
e.g., neighbourhoods of human resource activities (recruitment,
training, placement, promotion, incentives etc.) and that of
employee performance (attrition, engagement etc.). The model is
useful for big data (data of multiple companies). In this model,
multi dimensional data are plotted on bi-dimensional plot. This
technique allows organizations to decide on relationships and
trends and predict future behaviors or events.
3. Truth is that you can measure
• Truth=Response – Error
• Any response is affected by fixed or random errors.
• Errors can be controlled by sampling, controlling
environment, instruments, statistics.
• Any response can be measured by discrete and continuous
data.
• Discrete data can not be fractioned but Continuous data
can be fractioned.
• Discrete data can be calculated by frequency or
percentage.
• Both types of data can be interchanged by transformation.
• Transformation looses important properties of original
data.
D. Dutta Roy, ISI., Kolkata
4. Discrete VS Continuous
• Discrete data can be numeric -- like numbers
of apples -- but it can also be categorical -- like
red or blue, or male or female, or good or bad.
Continuous data are not restricted
to defined separate values, but can occupy
any value over a continuous range.
Lecture notes: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
5. HR Analytics
• HR analytics data include heads (number of
people) of recruitment, training, placement,
promotion, incentives etc. and those of their
performance like attrition, engagement etc.
• Analytics can prepare, one, two or multi-way
tables.
• Stem-leaf plot can be used to map discrete
data.
D. Dutta Roy, ISI., Kolkata
6. Stem-Leaf Plot of One-way table of Discrete data
D. Dutta Roy, ISI., Kolkata
7. Two-Way table or Crosstabulation
• Cross tabulation is a combination of two (or more) frequency tables
arranged such that each cell in the resulting table represents a
unique combination of specific values of crosstabulated variables.
• Thus, crosstabulation allows us to examine frequencies of
observations that belong to specific categories on more than one
variable.
• By examining these frequencies, we can identify relations between
crosstabulated variables. Only categorical (nominal) variables or
variables with a relatively small number of different meaningful
values should be crosstabulated.
• Note that in the cases where we do want to include a continuous
variable in a crosstabulation (e.g., income), we can first recode it
into a particular number of distinct ranges (e.g., low, medium,
high).
• Cross tabulation can be computed through Pivot table in MS-Excel .
9. Test of Significance
• The Pearson Chi-square is the most common
test for significance of the relationship
between categorical variables.
• Coefficient Phi: It is a measure of correlation
between two categorical variables in a 2 x 2
table. Its value can range from 0 (no relation
between factors; Chi-square=0.0) to 1 (perfect
relation between the two factors in the table).
10. Coefficient of Contingency
• The coefficient of contingency is a Chi-square
based measure of the relation between two
categorical variables (proposed by Pearson,
the originator of the Chi-square test). Its
advantage over the ordinary Chi-square is that
it is more easily interpreted, since its range is
always limited to 0 through 1 (where 0 means
complete independence).
11. Correspondence Analysis
• The Crosstabs procedure offers several
measures of association and tests of
association but cannot graphically represent
any relationships between the variables.
• Correspondence analysis is to describe the
relationships between two nominal variables
in a correspondence table in a low-
dimensional space.
14. Neighbourhood
• In the frequency table, there are 6 column and
7 Row variables. Neighbourhood can be
formed by clustering the row, column and
row- column correspondence.
• So, partitioning in the row and column
variables is important .
16. Neighbourhood Data Mapping
(N=902)
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
17. Where in Chi-Square fails, this model works
(Job Analysis Data, N=200)
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata