1. Labs & assignments
Lab activities will parallel lecture material (to all extent
possible) and handout materials will be used as
appropriate.
All lab assignments must be submitted via
Blackboard one week after the assigned dates unless
otherwise noted by the instructor.
No duplicated Lab!!!!!!!!!
2. Biol 205: Lab 1
Ecological Data
&
Descriptive Statistics
Dr. Davenport
3. Objectives
Why and what is statistics?
What is data?
Basic principle of statistics -- relationship
between (statistical) population and sample?
Descriptive Statistics
Assignment and Questions
5. Statistics
is a branch of applied mathematics that helps us
to make intelligent judgements and informed
decisions in the presence of uncertainty and
variation.
• Useful in the planning of experiments and
studies that will result in meaningful data.
• Provides a set of tools to extract and
understand information resulting from
experiments.
6. Data is :
collection of facts from which conclusions may be
drawn
representation of facts, concepts, or instructions in a
formal manner suitable for communication,
interpretation, or processing by human beings or by
computers.
formal representation of raw material from which
information is constructed via processing or
interpretation.
7. Why you need data?
Basic principle of statistics
The data is very important to present, summary
and interpret the ecological phenomena.
However, it usually is impossible or impractical
to monitor the entire habitat or obtain
measurements of all the organisms in a given
area.
So most time, only part of the population will
be sampled when you acquire a set of data.
8. 8
Population
The entire group of individuals is called the
population.
For example, a researcher may be interested in
the relation between class size (variable 1) and
academic performance (variable 2) for a
population of third-grade children.
9. 9
Sample
Usually populations are so large that a researcher
cannot examine the entire group. Therefore, a
sample (subset of population) is selected to
represent the population in a research study.
The goal is to use the results obtained from the
sample to infer information about the
population.
11. Summary
Population: the set of all measurements
of interest.
Sample: a subset of measurements of
interest to the investigator.
Population Sample
Statistics
12. Selecting Samples
Sample should be taken at a random order.
Why?
Random sampling implies that each measurement in
the population has an equal opportunity of being
selected as part of your sample.
Otherwise, your samples could be biased.
13. Sampling Replication
Why do we need replication?
Single measurement generally is insufficient to draw
a conclusion about a population.
15. Low Birth Weight Data
Variable Abbreviation
Identification Code ID
Low Birth Weight (0 = Birth Weight >= 2500g, LOW
1 = Birth Weight < 2500g)
Age of the Mother in Years AGE
Weight in Pounds at the Last Menstrual Period LWT
Race (1 = White, 2 = Black, 3 = Other) RACE
Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE
History of Premature Labor (0 = None, 1 = One, etc.) PTL
History of Hypertension (1 = Yes, 0 = No) HT
Presence of Uterine Irritability (1 = Yes, 0 = No) UI
Number of Physician Visits During the First Trimester FTV
(0 = None, 1 = One, 2 = Two, etc.)
Birth Weight in Grams BWT
18. Tabulations
Tables are used to describe qualitative data. The
tables simply present the counts, or frequencies,
observed in each category of a variable of interest.
Race
White
Black
Other
Count
96
26
67
%
51
14
35
22. Summary Statistics
Measures of Center (Central Tendency)
Mean
Median
Mode
Measures of Spread (Variability)
Range
Variance
Standard Deviation
23. Mean
The mean of a data set is the average of all the data values.
If the data are from a sample, the mean is denoted by
If the data are from a population, the mean is denoted
by “mu”.
x
x
n
i
x
N
i
x
24. Measures of Center
Mean (average): sum of sampled values
divided by the number of samples taken.
X =
1
n
Xi
i=1
n
å
n = sample size
Xi = sampled value
= symbol for summation
= population mean
X = sample mean
25. Measures of Center
Example:
30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37
1
1 1
30 26 ... 37 32.47
15
n
i
i
X X
n
Note: The mean is sensitive to extreme values.
30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37, 113
37.50X
How do extreme values affect the mean?
26. ( 1)
2
2 2
[ ]
[ ] [ 1]
if n is odd
if n is even
2
n
n n
x
X x x
Measures of Center
Median: the value of a set of measurement
that falls in the middle position when the
data are ordered from smallest to largest.
27. Measures of Center
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52
N = 15 is odd, so the 8th value is the median:
The 8th valueWhy 8? (15 + 1)/2 = 8
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52, 113
How do extreme values affect the median?
Now N=16, so the average of the 8th and 9th value is
the median, which is 30.5 ... not much different from
the original data!
28. Measures of Center
Mode: the value of a set of measurements
that occurs most frequently.
In our example data, the mode is 26.
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52
26 is the mode
Fact: For data that is
symmetric and unimodal, the
mean, median and mode are
similar.
29. Measures of Spread
Range: the difference between the largest
and smallest sample measurements.
In our example, the range is 36.
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52
Note: Two data sets may have
the same range, but very
different shape and variability.
R = 52-16 = 36
30. Measures of Spread
Sum of squared deviations from the mean,
which is referred to simply as the sum of
squares (SS)
_
SS = ∑(Xi - X)2
31. Measures of Spread
Variance (s2): the sum of the squares of the
deviations divided by the sample size
minus one.
Standard Deviation (s): the square root of
the variance.
2
2
( )
1
ix x
s
n
2
s s
33. Measures of Spread
A computationally more convenient formula
to calculate the variance:
2
2
2 2
2
1 1
i
i
i
x
xx nx ns
n n
34. Measures of Spread
The variance and standard deviation for our
example are:
16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52
2
510.8
22.6
s
s
36. Lab 1: Assignment
As a fishery scientist working for NOAA,
you did lots research on the strip bass (rockfish)
population in the Chesapeake Bay. In one of your
studies, you gathered data about the age structure
for rockfish population in the Chesapeake Bay,
and you need to do some statistical analysis before
you can present your data to the public.
The fish samples you collected were in 3 age
groups: age1 (1 year old); age2 (2 year old), and age
3 (3 years old).
37. Lab 1: Questions
1. What is statistical population (N)? What is
sample (n)? What is the relationship between
statistical population and sample? What
information does the sample (n) infer about the
statistical population (N)?
2. Write the definition (formulas) for variance and
standard deviation
3. Draw a bar chart and a pie chart about the
number of the fishes from different age groups
(the age structure about your sample).
38. Lab 1: Questions (continued)
4. What is the average weight of the fishes in your
entire sample?
5. What are the average weights of the fishes in
different age groups (age1, age2, and age3)?
6. What is the median weight of the fishes for age
1 group? And, What is the median weight of the
fishes for age 3 group?
7. What is the range of the weight for the fishes at
age 2 group?
39. Lab 1: Questions (continued)
8. Calculate the variance of the weight of the fishes
at age 2 group.
9. Calculate the standard deviation of the weight
for fishes at age 1 group.