2. Prepare for Data Analysis
There are several steps involved for data preparation. They are:
Questionnaire checking: Questionnaire checking involves
eliminating unacceptable questionnaires. These questionnaires may
be incomplete, instructions not followed, little variance, missing
pages, past cutoff date or respondent not qualified.
Editing: Editing looks to correct illegible, incomplete, inconsistent
and ambiguous answers.
Coding: Coding typically assigns symbols or numeric codes to
answers that do not already have them so that statistical techniques
can be applied.
3. Prepare Data for Analysis
Transcribing: Transcribing data involves transferring data so as to
make it accessible to people or applications for further processing.
Cleaning: Cleaning reviews data for consistencies. Inconsistencies
may arise from faulty logic, out of range or extreme values.
Statistical adjustments: Statistical adjustments applies to data that
requires weighting and scale transformations.
Analysis strategy selection: Finally, selection of a data analysis
strategy is based on earlier work in designing the research project but
is finalized after consideration of the characteristics of the data that
has been gathered.
https://www.cvent.com/en/blog/events/7-steps-prepare-data-analysis
4. Graphical presentation: Bar Chart
A bar chart or bar graph is a chart or graph that presents categorical
data with rectangular bars with heights or lengths proportional to the
values that they represent. The bars can be plotted vertically or
horizontally. A vertical bar chart is sometimes called a column chart.
5. Graphical presentation: Pie Chart
A pie chart (or a circle chart) is a circular statistical graphic, which is
divided into slices to illustrate numerical proportion. In a pie chart,
the arc length of each slice (and consequently its central angle and
area), is proportional to the quantity it represents.
6. Frequency table
Frequency refers to the number of times an event or a value occurs.
A frequency table is a table that lists items and shows the number of
times the items occur.
7. Cross Tabulation: How It Works
Cross tabulation is a method to quantitatively analyze the relationship
between multiple variables. It is also known as contingency tables or
cross tabs, cross tabulation groups variables to understand the
correlation between different variables. It also shows how correlations
change from one variable grouping to another. It is usually used in
statistical analysis to find patterns, trends, and probabilities within raw
data.
Cross tabulation is usually performed on categorical data — data that
can be divided into mutually exclusive groups.
Cross tabulations are used to examine relationships within data that
may not be readily apparent. Cross tabulation is especially useful for
studying market research or survey responses. Cross tabulation of
categorical data can be done with through tools such as SPSS, SAS,
and Microsoft Excel.
8. Cross Tabulation
Consider the below sample data set in Excel. It displays details about
commercial transactions for four product categories. Let’s use this data set
to show cross tabulation in action.
This data can be converted to pivot table format by selecting the entire table
and inserting a pivot table in the Excel file. The table can correlate different
variables row-wise, column-wise, or value-wise in either table format or
chart format.
9. Cross Tabulation
Then the results appear in a pivot table:
It is now clear that the highest sales were done for P1 using Master Card.
Therefore, we can conclude that the MasterCard payment method and
product P1 category is the most profitable combination.
Similarly, we can use cross tabulation and find the relation between the
product category and the payment method type with regard to the number
of transactions.
https://humansofdata.atlan.com/2016/01/cross-tabulation-how-why/
10. Chi square test
Chi square test is a statistical hypothesis test that is valid to perform
when the test statistic is chi-squared distributed under the null hypothesis.
A chi-square goodness of fit test determines if sample data matches a
population. For more details on this type, see: Goodness of Fit Test.
A chi-square test for independence compares two variables in a
contingency table to see if they are related. In a more general sense, it
tests to see whether distributions of categorical variables differ from each
another.
Formula:
https://www.youtube.com/watch?v=f53nXHoMXx4
11. Chi square test
A, B, C, and D. A random sample of 650 residents of the city is taken and
their occupation is recorded as "white collar", "blue collar", or "no collar".
The null hypothesis is that each person's neighborhood of residence is
independent of the person's occupational classification. The data are
tabulated as:
By the assumption of independence under the hypothesis we should
"expect" the number of white-collar workers in neighborhood A to be
https://www.youtube.com/watch?v=f53nXHoMXx4