This document provides an overview of different methods for analyzing and summarizing data, including statistics, graphs, and t-tests. It explains concepts like mean, median, range, and the Q-test. It also provides step-by-step instructions for conducting a t-test in Microsoft Excel to determine if the averages of two data sets are statistically different. Key tips are given to improve the reliability of t-test results.
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Statistics Notes
1. Analyzing Data
There are three kinds of lies -
lies, damned lies and
statistics.
~Benjamin Disraeli
Advanced
Biology
Mrs. Morgan
2. Using Data
Statistics: The only science that enables
different experts using the same figures
to draw different conclusions.
- Evan Esar
After collecting data during lab
investigations there are many ways to
organize and analyze it.
3. Presenting Data
• Always present data in charts Subject
HR HR
and graphs as well as in
Before Exercise After Exercise
1 60 84
words 2 76 80
3 62 90
• Example: 4 78 110
5 70 92
– Table 1 shows the heart rate of 6 66 92
subjects before and after 7 70 88
exercise. The average of
8 74 80
subjects’ heart rates shows a
rise of 10.2 beats per minute 9 78 100
after exercise. 10 68 88
Avg 70.2 80.4
4. Simple Data Analysis
Mean (average): sum of all
measurements divided by the Example
total # of measurements (duh…) Data set: 2 4 5 7 10
Median: the middle number in a Mean
series of measurements. (2+4+5+7+10)/5 = 5.6
Median
middle number = 5
Range: the difference between
Range
the highest and lowest values in a
10 – 2 = 8
series of measurements
5. More
Analysis
The Q-Test
– Used to determine if a data point should be left out
of analysis calculations
– Example: data set includes
45, 48, 52, 43, 89, 56, 48, 47, 44, 51, 50
(One of these things is not like the others…)
A Q-test decides if the analysis of the data set
should include the 89 or not
6. Q-Test
Q = gap Gap: distance between the
outlier and nearest data point
range
45, 48, 52, 43, 89, 56, 48, 47, 44, 51, 50
Q = (89-56) = 33
= .717 It helps to put
the data points in
(89-43) = 46 numerical order
So what do we do
with this number?
7. Q-
Test
Use a Q-table for the expected Q value
N-1 Q-value N = number of data points
3 .94
N-1 = 10
4 .76
5 .64 If calculated Q value is greater
6 .56 than expected Q value -
7 .51 discard the data point
8 .47 Qcalc = .717 > Qexp = .41
9 .44
Discard point 89
10 .41
8. The last and most useful type of
analysis
The T-Test
• Determines if the averages of two sets of results
are statistically different from each other, thus
allowing for a confident conclusion to be made
• The chance that the results are due to
coincidence must be below 5%
9. Say what?
Statistically different: t-test result is less than 0.05
What this means: if results are statistically different, there
is less than a 5% chance the results
are coincidence - therefore your
hypothesis is more likely to be
supported
Calculate a t-test value for 2
sets of data and compare it to .
05
10. Types of Data in a T-Test
• Tails:
– One-tailed: experimenter has expected results (one
group being higher/lower than another)
– Two-tailed: experimenter only assumes a difference in
results
• Paired/Two-Sample
– Paired: same group used in each experiment;
dependent (before and after)
– Two-Sample: two separate groups; independent (men
v.women)
11. T-Test Formula
In words: the mean of the first set minus the mean of the
second set over the square root of the variance of each
group divided by the number of results in each group.
That’s a crap
load of math
– we’ll use
PowerPoint
12. Using Microsoft Excel
Open the program and
create a new workbook.
Under “View” choose to
see the “Formula Builder”
13. T-Test using Microsoft Excel
Type your data in,
using one column
for each group of
results:
14. T-Test using Microsoft Excel
• Find the average for each set of data:
– Select the group of data
– Click on the equal (=) sign at the top of the
screen
– A window unfolds that looks like this:
15. T-Test using Microsoft Excel
• Select “average” from the pull-down menu,
and a screen appears:
16. T-Test using Microsoft Excel
• To take a t-test, choose an
empty cell and enter a “=“
which will bring up the
formula builder.
• If “TTEST” isn’t on the list
of functions, search for it at
the top of the builder.
• Double click on “TTEST”
17. T-Test using Microsoft Excel
Fill in the required data:
• Each of the categories are described
• Array = group of data
(highlight the column to select group –
don’t include any headings)
• Tails = one or two tailed (1 or 2)
• Type = paired or two-sample (1 or 2)
And the answer just
appears…
18. Tips for a Better T-Test
• The more results you have, the better and more
accurate the results.
• If you have several sets of results, perform
t-tests for all of them versus each other.
• The columns of data can also be used to
generate graphs if the lab calls for it.