SlideShare una empresa de Scribd logo
1 de 23
Basic Statistics - Concepts and Examples


• Data sources:
  – Data Reduction and Error Analysis for the
    Physical Sciences, Bevington, 1969

  – The Statistics HomePage:
  http://www.statsoftinc.com/textbook/stathome.html
Elementary Concepts
•   Variables: Variables are things that we measure, control, or manipulate
    in research. They differ in many respects, most notably in the role they
    are given in our research and in the type of measures that can be
    applied to them.

•   Observational vs. experimental research. Most empirical research
    belongs clearly to one of those two general categories. In
    observational research we do not (or at least try not to) influence any
    variables but only measure them and look for relations (correlations)
    between some set of variables. In experimental research, we
    manipulate some variables and then measure the effects of this
    manipulation on other variables.

•   Dependent vs. independent variables. Independent variables are those
    that are manipulated whereas dependent variables are only measured
    or registered.
Variable Types and Information Content
Measurement scales. Variables differ in "how well" they can be measured. Measurement
error involved in every measurement, which determines the "amount of information” obtained.
Another factor is the variable’s "type of measurement scale."


•    Nominal variables allow for only qualitative classification. That is, they can be
     measured only in terms of whether the individual items belong to some distinctively
     different categories, but we cannot quantify or even rank order those categories. Typical
     examples of nominal variables are gender, race, color, city, etc.
•    Ordinal variables allow us to rank order the items we measure in terms of which has
     less and which has more of the quality represented by the variable, but still they do not
     allow us to say "how much more.” A typical example of an ordinal variable is the
     socioeconomic status of families.
•    Interval variables allow us not only to rank order the items that are measured, but also
     to quantify and compare the sizes of differences between them. For example,
     temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale.
•    Ratio variables are very similar to interval variables; in addition to all the properties of
     interval variables, they feature an identifiable absolute zero point, thus they allow for
     statements such as x is two times more than y. Typical examples of ratio scales are
     measures of time or space.

     Most statistical data analysis procedures do not distinguish between the
            interval and ratio properties of the measurement scales.
Systematic and Random Errors
• Error: Defined as the difference between a
  calculated or observed value and the “true” value
   – Blunders: Usually apparent either as obviously incorrect data
     points or results that are not reasonably close to the expected
     value. Easy to detect.

   – Systematic Errors: Errors that occur reproducibly from faulty
     calibration of equipment or observer bias. Statistical analysis in
     generally not useful, but rather corrections must be made based on
     experimental conditions.

   – Random Errors: Errors that result from the fluctuations in
     observations. Requires that experiments be repeated a sufficient
     number of time to establish the precision of measurement.
Accuracy vs. Precision
• Accuracy: A measure of how close an
  experimental result is to the true value.

• Precision: A measure of how exactly the result is
  determined. It is also a measure of how
  reproducible the result is.

   – Absolute precision: indicates the uncertainty in the
     same units as the observation
   – Relative precision: indicates the uncertainty in terms of
     a fraction of the value of the result
Uncertainties
• In most cases, cannot know what the “true” value is unless
  there is an independent determination (i.e. different
  measurement technique).

• Only can consider estimates of the error.

• Discrepancy is the difference between two or more
  observations. This gives rise to uncertainty.

• Probable Error: Indicates the magnitude of the error we
  estimate to have made in the measurements. Means that if
  we make a measurement that we “probably” won’t be
  wrong by that amount.
Parent vs. Sample Populations

• Parent population: Hypothetical probability distribution if we
  were to make an infinite number of measurements of some variable or
  set of variables.

• Sample population: Actual set of experimental observations or
  measurements of some variable or set of variables.
• In General:
   (Parent Parameter) = lim (Sample Parameter)
                           N ->∞

   When the number of observations, N, goes to infinity.
some univariate statistical terms:
  mode: value that occurs most frequently in a distribution
          (usually the highest point of curve)
         may have more than one mode in a dataset

  median: value midway in the frequency distribution
            …half the area of curve is to right and other to left

  mean: arithmetic average
            …sum of all observations divided by # of observations
      poor measure of central tendency in skewed distributions

  range: measure of dispersion about mean
                (maximum minus minimum)
       when max and min are unusual values, range may be
             a misleading measure of dispersion
Distribution vs. Sample Size




 QuickTime™ and a GIF decompressor are needed to see this picture.
histogram is a useful graphic representation of
   information content of sample or parent population



                                 many statistical tests assume
                                values are normally distributed



                                      not always the case!
                                      examine data prior
                                         to processing




                                                 from: Jensen, 1996
Deviations
The deviation, di, of any measurement xi from the mean µ of the
parent distribution is defined as the difference between xi and µ

                         d xµ
                         i≡i−



Average deviation, α, is defined as the average of the magnitudes
of the deviations, which is given by the absolute value of the
deviations.
                                  n
                    1
               α N ∞ ∑µ
                ≡m x
                 l
                 i     i−
                 →  N1
                     =
                     i
variance: average squared deviation of all possible observations
              from a sample mean (calculated from sum of squares)
                                     n
                  σ2i = lim [1/N Σ (xi - µ)2]
                                     i=1
                          N->∞

                                  where: µ is the mean,
                                         xi is observed value, and
                           n
                                         N is the number of observations
                  s2i =    Σ
                           i=1 (xi- µ)2    Number decreased from N to
                                 N-1       N - 1for the “sample” variance
                                           as µ is used in the calculation

standard deviation: positive square root of the variance
       small std dev: observations are clustered tightly
                              around a central value
       large std dev: observations are scattered widely
                              about the mean
Sample Mean and Standard Deviation
For a series of N observations, the most probable estimate of the
mean µ is the average x observations. We refer to this as
                          of the
the sample mean x   to distinguish it from the parent mean µ.
                       1
                     µx ∑
                     ≅≡  x
                         i                    Sample Mean
                                N
   Our best estimate of the standard deviation σ would be from:

                σ∑= x
                 21
                          ()    µ ∑             µ
                                   21 2
                  ≅ x                          −
                                                  2
                              i
                               −              i
                     N      N

 But we cannot know the true parent mean µ so the best estimate
 of the sample variance and standard deviation would be:

       σs 1
         ≡ ∑          ()
               2
        ≅   x
         2    2
            i x
             −   Sample Variance
                   −
                   N1
Distributions
• Binomial Distribution: Allows us to define the
  probability, p, of observing x a specific combination of n
  items, which is derived from the fundamental formulas for
  the permutations and combinations.

   – Permutations: Enumerate the number of permutations,
     Pm(n,x), of coin flips, when we pick up the coins one at
     a time from a collection of n coins and put x of them
     into the “heads” box.
                               n n − ( −) 321
                                ≡n )    K( ( (
                               ! ( 1n 2 ) ) )
          n
          !
     ( =
    P,)
    m x
     n                                       !=
                                             11
        ( −)
        nx  !                                 =
                                             01
                                             !
Distributions - con’t.

– Combinations: Relates to the number of ways we can
  combine the various permutations enumerated above
  from our coin flip experiment. Thus the number of
  combinations is equal to the number of permutations
  divided by the degeneracy factor x! of the permutations.


      Px
      m )
      (n
       ,   n 
           !   n
   n=
   C)
   (x
    ,    =   ≡ 
       x x− 
       !  ( ) 
          ! x x
           n !
Probability and the Binomial Distribution
 Coin Toss Experiment: If p is the probability of success (landing heads up)
 is not necessarily equal to the probability q = 1 - p for failure
 (landing tails up) because the coins may be lopsided!

 The probability for each of the combinations of x coins heads up and
 n -x coins tails up is equal to pxqn-x. The binomial distribution can be
 used to calculate the probability:
     x−
      x
     nn   n x
          !     −
 P, =p =
 Bn  q
 ( p 
  x)
  ,           −x
             1n
            pp
             ( )
    
     x   x−
         ! x
         ( )
          n!
The coefficients PB(x,n,p) are closely related to the binomial theorem
for the expansion of a power of a sum:
                      n x nx
                       −    n
               pq ∑ q
               (+ = xp 
                 )
                        n
                      
                   x 0
                   =       
                             
Mean and Variance: Binomial Distribution
 The mean µ of the binomial distribution is evaluated by combining the
 definition of µ with the function that defines the probability, yielding:

          n x n
                   n
       µ  ! −p− − =
        ∑ x
       =x
             !
                 1)
                    x
                 ( p np
        x (
        = x
         0  n) !     
The average of the number of successes will approach a mean value µ
given by the probability for success of each item p times the number of
items. For the coin toss experiment p=1/2, half the coins should land
heads up on average.
       2 n x  
σ                  µ
        n
          !
  2
    ∑x− ( p = −
    =( )
      −
       x
         ( )
             −
              n−
               x
            p ) pp
             1   n )
                 (1
    x
     =
     0   ! x
          n!   
If the the probability for a single success p is equal to the probability for
failure p=q=1/2, the final distribution is symmetric about the mean and
mode and median equal the mean. The variance, σ2 = µ/2.
Other Probability Distributions: Special Cases
• Poisson Distribution: An approximation to the
  binomial distribution for the special case when the average
  number of successes is very much smaller than the
  possible number i.e. µ << n because p << 1.
  – Important for the study of such phenomena as radioactive decay.
    Distribution is NOT necessarily symmetric! Data are usually
    bounded on one side and not the other. Advantage is that σ 2 = µ.



                µ = 1.67                                      µ = 10.0
                σ = 1.29                                      σ = 3.16
Gaussian or Normal Error Distribution Details
• Gaussian Distribution: Most important probability
   distribution in the statistical analysis of experimental data.
   functional form is relatively simple and the resultant
   distribution is reasonable. Again this is a special limiting
   case to the binomial distribution where the number of
   possible different observations, n, becomes infinitely large
   yielding np >> 1.
    – Most probable estimate of the mean µ from a random sample of
      observations is the average of those observations!
     P.E. = 0.6745σ = 0.2865 Γ
                                    Probable Error (P.E.) is defined as the
                                    absolute value of the deviation such
                                    that PG of the deviation of any random
 Γ = 2.354σ                         observation is < 1/2
                                    Tangent along the steepest portion
                                    of the probability curve intersects
                                    at e-1/2 and intersects x axis at the
                                    points x = µ ± 2σ
For gaussian or normal error distributions:

       Total area underneath curve is 1.00 (100%)

       68.27% of observations lie within ± 1 std dev of mean
       95%    of observations lie within ± 2 std dev of mean
       99%    of observations lie within ± 3 std dev of mean

   Variance, standard deviation, probable error, mean, and
   weighted root mean square error are commonly used
   statistical terms in geodesy.

compare (rather than attach significance to numerical value)
Gaussian Details, con’t.
The probability function for the Gaussian distribution is defined as:

                        1 µ
                        −2
                          x 
               Pµ=
               ( σ
                    1
                       −
                      e   
               G ,)
                        2σ 
                x
                ,     xp
                    σ 
                     π
                     2      

 The integral probability evaluated between the limits of µ±zσ,
 Where z is the dimensionless range z = |x -µ|/σ is:
                      µz
                      +σ
                           1 z −/22
         ( , σ=
            µ       (, σ =
                     µ d
        A x , ) ∫ Px , ) x
                           2∫
                              e1 x
        G
                µz
                −σ
                   G
                           πz−




                         Az ∞ 1
                         G = )
                          (   =
Gaussian Density vs. Distribution Functions



      QuickTime™ and a GIF decompressor are needed to see this picture.
Lorentzian or Cauchy Distribution
• Lorentzian Distribution:           Similar distribution
  function but unrelated to binomial distribution. Useful for
  describing data related to resonance phenomena, with
  particular applications in nuclear physics (e.g. Mössbauer
  effect). Distribution is symmetric about µ.
  – Distinctly different probability distribution from Gaussian
    function. Mean and standard deviation not simply defined.

                                           1    Γ /2
                                   P(xµ )=
                                    L , ,Γ
                                           π(x−µ2 +( /22
                                                )   Γ )
                                          ∞                   ∞
                                               1   1
                            A(xµ )=∫ P(xµ ) x= ∫
                             L , ,Γ    L , ,Γd        d 1
                                                       z=
                                    −∞
                                               π− 1 2
                                                 ∞
                                                   +z
                                                    µ
                                                (x− )
                                              z=
                                                  Γ )
                                                 ( /2
                                      Γ=Full Width at Half-Maximum

Más contenido relacionado

La actualidad más candente

Review on probability distributions, estimation and hypothesis testing
Review on probability distributions, estimation and hypothesis testingReview on probability distributions, estimation and hypothesis testing
Review on probability distributions, estimation and hypothesis testingMeselu Mellaku
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributionsrishi.indian
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problemsSSA KPI
 
Normal distribution
Normal distributionNormal distribution
Normal distributionAlok Roy
 
A simple numerical procedure for estimating nonlinear uncertainty propagation
A simple numerical procedure for estimating nonlinear uncertainty propagationA simple numerical procedure for estimating nonlinear uncertainty propagation
A simple numerical procedure for estimating nonlinear uncertainty propagationISA Interchange
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability DistributionsE-tan
 
Chapter 2 Probabilty And Distribution
Chapter 2 Probabilty And DistributionChapter 2 Probabilty And Distribution
Chapter 2 Probabilty And Distributionghalan
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
MTH120_Chapter9
MTH120_Chapter9MTH120_Chapter9
MTH120_Chapter9Sida Say
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projectionNBER
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big DataKezhan SHI
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributionsnszakir
 
Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)Matthew Leingang
 

La actualidad más candente (19)

Poisson statistics
Poisson statisticsPoisson statistics
Poisson statistics
 
Review on probability distributions, estimation and hypothesis testing
Review on probability distributions, estimation and hypothesis testingReview on probability distributions, estimation and hypothesis testing
Review on probability distributions, estimation and hypothesis testing
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problems
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
7주차
7주차7주차
7주차
 
Guia de estudio para aa5
Guia de estudio  para aa5 Guia de estudio  para aa5
Guia de estudio para aa5
 
A simple numerical procedure for estimating nonlinear uncertainty propagation
A simple numerical procedure for estimating nonlinear uncertainty propagationA simple numerical procedure for estimating nonlinear uncertainty propagation
A simple numerical procedure for estimating nonlinear uncertainty propagation
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability Distributions
 
Chapter 2 Probabilty And Distribution
Chapter 2 Probabilty And DistributionChapter 2 Probabilty And Distribution
Chapter 2 Probabilty And Distribution
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
MTH120_Chapter9
MTH120_Chapter9MTH120_Chapter9
MTH120_Chapter9
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projection
 
Chapter 14 Part Ii
Chapter 14 Part IiChapter 14 Part Ii
Chapter 14 Part Ii
 
Stats chapter 7
Stats chapter 7Stats chapter 7
Stats chapter 7
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big Data
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributions
 
Binomia ex 2
Binomia ex 2Binomia ex 2
Binomia ex 2
 
Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)
 

Similar a Basic statistics

Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersionForensic Pathology
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadRione Drevale
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersionForensic Pathology
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersionForensic Pathology
 
Standard deviation
Standard deviationStandard deviation
Standard deviationMai Ngoc Duc
 
PG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statisticsPG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statisticsAashish Patel
 
Measures of dispersion or variation
Measures of dispersion or variationMeasures of dispersion or variation
Measures of dispersion or variationRaj Teotia
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 SlidesBrent Heard
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsUniversity of Salerno
 
ch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingwindri3
 
measures-of-variability-11.ppt
measures-of-variability-11.pptmeasures-of-variability-11.ppt
measures-of-variability-11.pptNievesGuardian1
 
Biostatistics
BiostatisticsBiostatistics
Biostatisticspriyarokz
 

Similar a Basic statistics (20)

Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersion
 
Inferential statistics-estimation
Inferential statistics-estimationInferential statistics-estimation
Inferential statistics-estimation
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersion
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersion
 
Basic stat review
Basic stat reviewBasic stat review
Basic stat review
 
OR2 Chapter1
OR2 Chapter1OR2 Chapter1
OR2 Chapter1
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
PG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statisticsPG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statistics
 
Measures of dispersion or variation
Measures of dispersion or variationMeasures of dispersion or variation
Measures of dispersion or variation
 
lecture-2.ppt
lecture-2.pptlecture-2.ppt
lecture-2.ppt
 
POINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.pptPOINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.ppt
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 Slides
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
ch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursing
 
measures-of-variability-11.ppt
measures-of-variability-11.pptmeasures-of-variability-11.ppt
measures-of-variability-11.ppt
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 

Basic statistics

  • 1. Basic Statistics - Concepts and Examples • Data sources: – Data Reduction and Error Analysis for the Physical Sciences, Bevington, 1969 – The Statistics HomePage: http://www.statsoftinc.com/textbook/stathome.html
  • 2. Elementary Concepts • Variables: Variables are things that we measure, control, or manipulate in research. They differ in many respects, most notably in the role they are given in our research and in the type of measures that can be applied to them. • Observational vs. experimental research. Most empirical research belongs clearly to one of those two general categories. In observational research we do not (or at least try not to) influence any variables but only measure them and look for relations (correlations) between some set of variables. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables. • Dependent vs. independent variables. Independent variables are those that are manipulated whereas dependent variables are only measured or registered.
  • 3. Variable Types and Information Content Measurement scales. Variables differ in "how well" they can be measured. Measurement error involved in every measurement, which determines the "amount of information” obtained. Another factor is the variable’s "type of measurement scale." • Nominal variables allow for only qualitative classification. That is, they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. Typical examples of nominal variables are gender, race, color, city, etc. • Ordinal variables allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say "how much more.” A typical example of an ordinal variable is the socioeconomic status of families. • Interval variables allow us not only to rank order the items that are measured, but also to quantify and compare the sizes of differences between them. For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale. • Ratio variables are very similar to interval variables; in addition to all the properties of interval variables, they feature an identifiable absolute zero point, thus they allow for statements such as x is two times more than y. Typical examples of ratio scales are measures of time or space. Most statistical data analysis procedures do not distinguish between the interval and ratio properties of the measurement scales.
  • 4. Systematic and Random Errors • Error: Defined as the difference between a calculated or observed value and the “true” value – Blunders: Usually apparent either as obviously incorrect data points or results that are not reasonably close to the expected value. Easy to detect. – Systematic Errors: Errors that occur reproducibly from faulty calibration of equipment or observer bias. Statistical analysis in generally not useful, but rather corrections must be made based on experimental conditions. – Random Errors: Errors that result from the fluctuations in observations. Requires that experiments be repeated a sufficient number of time to establish the precision of measurement.
  • 5. Accuracy vs. Precision • Accuracy: A measure of how close an experimental result is to the true value. • Precision: A measure of how exactly the result is determined. It is also a measure of how reproducible the result is. – Absolute precision: indicates the uncertainty in the same units as the observation – Relative precision: indicates the uncertainty in terms of a fraction of the value of the result
  • 6. Uncertainties • In most cases, cannot know what the “true” value is unless there is an independent determination (i.e. different measurement technique). • Only can consider estimates of the error. • Discrepancy is the difference between two or more observations. This gives rise to uncertainty. • Probable Error: Indicates the magnitude of the error we estimate to have made in the measurements. Means that if we make a measurement that we “probably” won’t be wrong by that amount.
  • 7. Parent vs. Sample Populations • Parent population: Hypothetical probability distribution if we were to make an infinite number of measurements of some variable or set of variables. • Sample population: Actual set of experimental observations or measurements of some variable or set of variables. • In General: (Parent Parameter) = lim (Sample Parameter) N ->∞ When the number of observations, N, goes to infinity.
  • 8. some univariate statistical terms: mode: value that occurs most frequently in a distribution (usually the highest point of curve) may have more than one mode in a dataset median: value midway in the frequency distribution …half the area of curve is to right and other to left mean: arithmetic average …sum of all observations divided by # of observations poor measure of central tendency in skewed distributions range: measure of dispersion about mean (maximum minus minimum) when max and min are unusual values, range may be a misleading measure of dispersion
  • 9. Distribution vs. Sample Size QuickTime™ and a GIF decompressor are needed to see this picture.
  • 10. histogram is a useful graphic representation of information content of sample or parent population many statistical tests assume values are normally distributed not always the case! examine data prior to processing from: Jensen, 1996
  • 11. Deviations The deviation, di, of any measurement xi from the mean µ of the parent distribution is defined as the difference between xi and µ d xµ i≡i− Average deviation, α, is defined as the average of the magnitudes of the deviations, which is given by the absolute value of the deviations. n 1 α N ∞ ∑µ ≡m x l i i− → N1 = i
  • 12. variance: average squared deviation of all possible observations from a sample mean (calculated from sum of squares) n σ2i = lim [1/N Σ (xi - µ)2] i=1 N->∞ where: µ is the mean, xi is observed value, and n N is the number of observations s2i = Σ i=1 (xi- µ)2 Number decreased from N to N-1 N - 1for the “sample” variance as µ is used in the calculation standard deviation: positive square root of the variance small std dev: observations are clustered tightly around a central value large std dev: observations are scattered widely about the mean
  • 13. Sample Mean and Standard Deviation For a series of N observations, the most probable estimate of the mean µ is the average x observations. We refer to this as of the the sample mean x to distinguish it from the parent mean µ. 1 µx ∑ ≅≡ x i Sample Mean N Our best estimate of the standard deviation σ would be from: σ∑= x 21 () µ ∑ µ 21 2 ≅ x − 2 i − i N N But we cannot know the true parent mean µ so the best estimate of the sample variance and standard deviation would be: σs 1 ≡ ∑ () 2 ≅ x 2 2 i x − Sample Variance − N1
  • 14. Distributions • Binomial Distribution: Allows us to define the probability, p, of observing x a specific combination of n items, which is derived from the fundamental formulas for the permutations and combinations. – Permutations: Enumerate the number of permutations, Pm(n,x), of coin flips, when we pick up the coins one at a time from a collection of n coins and put x of them into the “heads” box. n n − ( −) 321 ≡n ) K( ( ( ! ( 1n 2 ) ) ) n ! ( = P,) m x n != 11 ( −) nx ! = 01 !
  • 15. Distributions - con’t. – Combinations: Relates to the number of ways we can combine the various permutations enumerated above from our coin flip experiment. Thus the number of combinations is equal to the number of permutations divided by the degeneracy factor x! of the permutations. Px m ) (n , n  ! n n= C) (x , = ≡  x x−  ! ( )  ! x x n !
  • 16. Probability and the Binomial Distribution Coin Toss Experiment: If p is the probability of success (landing heads up) is not necessarily equal to the probability q = 1 - p for failure (landing tails up) because the coins may be lopsided! The probability for each of the combinations of x coins heads up and n -x coins tails up is equal to pxqn-x. The binomial distribution can be used to calculate the probability:  x−  x nn n x ! − P, =p = Bn  q ( p  x) , −x 1n pp ( )  x x− ! x ( ) n! The coefficients PB(x,n,p) are closely related to the binomial theorem for the expansion of a power of a sum: n x nx  − n pq ∑ q (+ = xp  ) n  x 0 =   
  • 17. Mean and Variance: Binomial Distribution The mean µ of the binomial distribution is evaluated by combining the definition of µ with the function that defines the probability, yielding:  n x n n µ  ! −p− − = ∑ x =x ! 1) x ( p np x ( = x 0 n) !  The average of the number of successes will approach a mean value µ given by the probability for success of each item p times the number of items. For the coin toss experiment p=1/2, half the coins should land heads up on average.  2 n x  σ µ n ! 2 ∑x− ( p = − =( ) − x ( ) − n− x p ) pp 1 n ) (1 x = 0 ! x n!  If the the probability for a single success p is equal to the probability for failure p=q=1/2, the final distribution is symmetric about the mean and mode and median equal the mean. The variance, σ2 = µ/2.
  • 18. Other Probability Distributions: Special Cases • Poisson Distribution: An approximation to the binomial distribution for the special case when the average number of successes is very much smaller than the possible number i.e. µ << n because p << 1. – Important for the study of such phenomena as radioactive decay. Distribution is NOT necessarily symmetric! Data are usually bounded on one side and not the other. Advantage is that σ 2 = µ. µ = 1.67 µ = 10.0 σ = 1.29 σ = 3.16
  • 19. Gaussian or Normal Error Distribution Details • Gaussian Distribution: Most important probability distribution in the statistical analysis of experimental data. functional form is relatively simple and the resultant distribution is reasonable. Again this is a special limiting case to the binomial distribution where the number of possible different observations, n, becomes infinitely large yielding np >> 1. – Most probable estimate of the mean µ from a random sample of observations is the average of those observations! P.E. = 0.6745σ = 0.2865 Γ Probable Error (P.E.) is defined as the absolute value of the deviation such that PG of the deviation of any random Γ = 2.354σ observation is < 1/2 Tangent along the steepest portion of the probability curve intersects at e-1/2 and intersects x axis at the points x = µ ± 2σ
  • 20. For gaussian or normal error distributions: Total area underneath curve is 1.00 (100%) 68.27% of observations lie within ± 1 std dev of mean 95% of observations lie within ± 2 std dev of mean 99% of observations lie within ± 3 std dev of mean Variance, standard deviation, probable error, mean, and weighted root mean square error are commonly used statistical terms in geodesy. compare (rather than attach significance to numerical value)
  • 21. Gaussian Details, con’t. The probability function for the Gaussian distribution is defined as: 1 µ  −2 x  Pµ= ( σ 1 − e    G ,) 2σ  x , xp σ  π 2   The integral probability evaluated between the limits of µ±zσ, Where z is the dimensionless range z = |x -µ|/σ is: µz +σ 1 z −/22 ( , σ= µ (, σ = µ d A x , ) ∫ Px , ) x 2∫ e1 x G µz −σ G πz− Az ∞ 1 G = ) ( =
  • 22. Gaussian Density vs. Distribution Functions QuickTime™ and a GIF decompressor are needed to see this picture.
  • 23. Lorentzian or Cauchy Distribution • Lorentzian Distribution: Similar distribution function but unrelated to binomial distribution. Useful for describing data related to resonance phenomena, with particular applications in nuclear physics (e.g. Mössbauer effect). Distribution is symmetric about µ. – Distinctly different probability distribution from Gaussian function. Mean and standard deviation not simply defined. 1 Γ /2 P(xµ )= L , ,Γ π(x−µ2 +( /22 ) Γ ) ∞ ∞ 1 1 A(xµ )=∫ P(xµ ) x= ∫ L , ,Γ L , ,Γd d 1 z= −∞ π− 1 2 ∞ +z µ (x− ) z= Γ ) ( /2 Γ=Full Width at Half-Maximum