SlideShare una empresa de Scribd logo
1 de 57
ANALYZING QUANTITATIVE DATA
Presented by: Jean Marie Villamor
DEALING WITH DATA
CODING DATA systematically reorganizing raw data into
 a format that is machine-readable (i.e. easy to analyze
 using computers)

   Coding can be a clerical task when the data are recorded
    as numbers on well-organized recording sheets, but it is
    very difficult when, for example, a researcher wants to
    code answers to open-ended survey questions into
    numbers in a process similar to latent content analysis.
DEALING WITH DATA
   Researchers uses coding system and a codebook for
    data coding:

       Coding system  set of rules stating that certain numbers are
        assigned to variable attributes.
        example: a researcher codes males as 1 and females as 2.

       Codebook  a document describing the coding system and
        the location of data for variables in a format that computers
        can use.
DEALING WITH DATA
   Precoding  placing the code categories on the
    questionnaire.

   If a researcher does not precode, his first step after
    collecting data is to create a codebook. He also gives
    each case an identification number to keep track of the
    cases. Next he transfers the information from each
    questionnaire into a format that computers can read.
ENTERING DATA

   Most computer programs designed for data analysis
    need the data in a grid format.

   In the grid, each row represents a respondent, subject, or
    case  DATA RECORDS because each is a record of
    data for a single case

   Column or sets of columns represents specific variables.
CLEANING DATA
 Accuracy is extremely important when coding data.
 Errors made in coding can cause misleading results.

 Highly recommended to recheck all coding.



   Coding can be verified in two ways:
     Possible code cleaning (wild code checking)
      involves checking the categories of all variables for impossible
      codes
     Contingency cleaning (consistency checking)
     involves cross-classifying two variables and looking for
      logically impossible combinations.
COMPUTERS AND SOCIAL RESEARCH
   Researchers use computers to perform specialized tasks
    more efficiently and effectively.

    Example: organize data, calculate statistics, write reports
COMPUTERS AND SOCIAL RESEARCH
   Computer began as mechanical devices for sorting cards
    that had holes punched into them.

   Each hole were punched in specific locations and
    represented information about a variable.

   These machines organized vast amounts of information
    more quickly, reliably, and efficiently than the paper-and-
    pencil methods.

   IBM CARDS  thin cardboard cards that had 80
    columns and 12 rows, or 960 spaces for information.
COMPUTERS AND SOCIAL RESEARCH
 COMPUTER TERMINAL is a simple typewriter-like
  device connected to a mainframe computer, which
  people use to type data and instructions directly into the
  computer.
 MICROCOMPUTERS have replaced mainframes for
  many tasks, to more people, and have stimulated new
  uses form computers.
 Three basic parts:
     Monitor (CRT or VDT)
     Keyboard
     CPU (Central Processing Unit)
COMPUTERS AND SOCIAL RESEARCH
   Information can get into a microcomputer in 4 ways:
     Some is built into the computer memory itself
     User can type it on the keyboard
     It can come across a telephone line and into the computer
      through a modem
     It can be stored on floppy disks, or data travellers which
      computers can read.
HOW COMPUTERS HELP THE RESEARCHER

 Purpose of the computer: to write reports, to organize
  large amounts of data, and to compute statistical
  measures.
 Computers help researchers perform tasks faster and
  with greater accuracy than by hand.
 Without the appropriate computer, a researcher cannot
  analyze data from a large-scale research project or
  calculate complicated statistics.
 SPSS statistical package for the social sciences is a
  statistical software most social researchers use that is
  specifically designed for analyzing quantitative social
  science data.
STATISTICS
   a branch of applied mathematics used to manipulate and
    summarize the features of numbers.

   Descriptive statistics describe numerical data. They can
    be categorized by the number of variables involved:
     Univariate
     Bivariate
     Multivariate
RESULTS WITH ONE VARIABLE (UNIVARIATE)
 Frequency Distributions
 Measures of Central Tendency

 Measures of Variation
RESULTS WITH ONE VARIABLE
   Univariate describes one variable. The easiest way to
    describe the numerical data of one variable is through
    frequency distribution.

FREQUENCY DISTRIBUTION  can be used with
 nominal-, ordinal-, interval-, or ratio-level data and takes
 many forms.
Example: A data having 400 respondents can be
 summarized on the gender of the respondents with a raw
 count or a percentage frequency distribution.
GENDER COUNT OF RESPONDENTS
RAW COUNT FREQUENCY                 PERCENTAGE FREQUENCY
DISTRIBUTION
Distribution
Gender             Frequency        Gender          Frequency
Male               100              Male            25%
Female             300              Female          75%
Total              400              Total           100%

BAR CHART

 Females
                                             Column2
                                             Column1
    Males                                    frequency distribution


               0     20        40       60     80
RESULTS WITH ONE VARIABLE
   Common types of graphic representations:
     Histogram
     Bar chart
     Pie chart


   For interval- or ratio-level data  researcher groups the
    information into categories; grouped categories should be
    mutually exclusive

   Interval- or ratio-level data are often plotted in a
    frequency polygon.
RESULTS WITH ONE VARIABLE
   MEASURES OF CENTRAL TENDENCY
     Mean  arithmetic average
               most widely used measure of central tendency
               can be used only with interval- or ratio-level data.
     Median middle point
                50th percentile, or the point at which half the cases
      are above it and half below it
                can be used with ordinal-, interval-, or ratio-level
      data (not nominal-level)
     Mode easiest to use and can be used with nominal, ordinal
      , interval, or ratio data
              most common or frequently occurring number
RESULTS WITH ONE VARIABLE
   If the frequency distribution forms a NORMAL or BELL-
    SHAPED CURVE, the three measures of central
    tendency equal each other

        number of cases
                                                mean , median, mode




        Lowest            values of variables           highest
RESULTS WITH ONE VARIABLE
   SKEWED DISTRIBUTION  measures of central
    tendency are not equal
                             mode
                    median
             mean




                                    mode
                                           median
                                                    mean
RESULTS WITH ONE VARIABLE
   If most cases have lower scores with a few extreme high
    scores, the mean will be the highest, the median in the
    middle, and the mode the lowest.

   If most cases have higher scores with a few extreme low
    scores, the mean will be the lowest, the median in the
    middle, and the mode the highest.

   In general, the median is best for skewed
    distributions, although the mean is used in most other
    statistics.
RESULTS WITH ONE VARIABLE
   MEASURES OF VARIATION  one- number summary
    of a distribution

   Three basic ways:
     Range
     Percentile
     Standard deviation
RESULTS WITH ONE VARIABLE
 RANGE  simplest
             largest and smallest score
             for ordinal-, interval-, and ratio-level
 PERCENTILES  tell the score at a specific place within
  the distribution
             for ordinal-, interval-, and ratio-level
 STANDARD DEVIATION  most difficult to compute
  measure of dispersion; yet is also the most
  comprehensive and widely used.
        interval or ratio level
        based on the mean and gives an “average
  distance” between all scores and the mean
RESULTS WITH ONE VARIABLE
STEPS IN COMPUTING THE STANDARD DEVIATION
1. Compute the mean.
2. Subtract the mean from each score.
3. Square the resulting difference for each score.
4. Total up the squared differences to get the sum of
    squares.
5. Divide the sum of squares by the number of cases to get
    the variance.
6. Take the square root of the variance, which is the
    standard deviation.
RESULTS WITH ONE VARIABLE
 Standard deviation and the mean are used to create z-
  scores.
 Z-scores  let a researcher compare two or more
  distribution or groups.
              also called a standardized score, expresses
  points or scores on a frequency distribution in terms of a
  number of standard deviations from the mean. Scores
  are in terms of their relative position within a
  distribution, not as absolute values.
EXAMPLE OF COMPUTING THE STANDARD DEVIATION
[8 RESPONDENTS, VARIABLE = YEARS OF SCHOOLING]
        score                 score- mean           squared (score-mean)
         15                        2.5                        6.25
         12                        -0.5                       .25
         12                        -0.5                       .25
         10                        -2.5                       6.25
         16                        3.5                       12.25
         18                        5.5                       30.25
          8                        -4.5                      20.25
          9                        -3.5                      12.25
    Mean = 12.5                                       sum of squares = 88


Variance = sum of squares = 88 = 11
            no. of cases      8
Standard deviation = square root of variance = √11 = 3.317 years
Symbols:              Formula:

X = score of case     Standard deviation
                      √ ∑(x-x )2
∑ = sigma for sum          N

N = number of cases

X = mean
RESULTS WITH TWO VARIABLES (BIVARIATE)
 The Scattergram
 Cross-tabulation/ Percentaged Table

 Measures of Association
RESULTS WITH TWO VARIABLES
   Bavariate statistics  let a researcher consider two
    variables together and describe the relationship between
    variables

   Statistical relationships are based on two ideas:
     Covariation  things that are associated
     Example: life expectancy and income

     Independence  no association or relationship between two
      variables
    Example: number of siblings and life expectancy
RESULTS WITH TWO VARIABLES
   SCATTERGRAM  a graph on which a researcher plots
    each case or observation, where each axis represents
    the value of one variable.
     used for interval- or ratio-level data; rarely for ordinal;
    never for nominal

    independent variable = horizontal axis (x)
    dependent variable = vertical axis (y)
RESULTS WITH TWO VARIABLES
   Three aspects of a bivariate relationship in a scattergram:
       Form
         Independence  random scatter with no pattern, or a straight line that
          is parallel to one of the axis
         Linear  straight line that can be visualized in the middle of a maze of
          cases running from one corner to another
         Curvilinear centre of a maze of cases could form a U curve, right
          side up or upside down, or an S curve
       Direction
           Positive and negative
       Precision
         Amount of spread in the points on the graph
         High level occurs when the points hug the line that summarizes the
          relationships
         Low level occurs when the points are widely spread around the line
RESULTS WITH TWO VARIABLES
 PERCENTAGED TABLES  presents the same
  information as a scattergram in a more condensed form
 Cross-tabulation  cases are organized in the table on
  the basis of two variables at the same time.

   Bavariate tables usually contain percentages.
RESULTS WITH TWO VARIABLES
CONSTRUCTING PERCENTAGE TABLES
1. Raw data
2. Compound frequency distribution (CFD)
   Figure all possible combinations of variable categories
   Make a mark next to the combination category into which each
    case falls
   Add up the marks for the number of cases in a combination
    category
3. Set up the parts of a table (labeling rows and columns)
4. Each number from the CFD is placed in a cell in the
  table that corresponds to the combination of variable
  categories.
THE PARTS OF A TABLE
 Title
 Label row and column variable and give a name to each
  of the variable categories.
 Marginals  totals of the columns and rows
 Body of the table
 Cell of a table
 If there is missing information (cases in which a
  respondent refused to answer, ended interview, said “I
  don’t know”, etc.), report the number of missing cases
  near the table to account for all original cases
 For percentaged tables, provide the number of cases or
  N on which percentages are computed in parentheses
  near the total of 100%. This makes it possible to go back
  and forth from a percentaged table to a raw count table
  and vice versa.
RAW COUNT TABLE
AGE GROUP BY ATTITUDE ABOUT CHANGING THE DRINKING AGE
                                Age group
 Attitude    Under 30   30-45     46-60     61 and   total
                                             older
  Agree         20       10         4         3         37
No opinion     3 (4)     10        10         2         25
 Disagree       3        5         21        10         39
              _____     _____     _____     _____    _____
  Total         26       25        35        15         101
 Missing       =8
cases (6)
COLUMN PERCENTAGED TABLE

AGE GROUP BY ATTITUDE ABOUT CHANGING THE DRINKING AGE

                                 Age group
  Attitude   Under 30   30-45      46-60     61 and     Total
                                              older
  Agree      76.9 %      40%      11.14%      20%     36.6%
No opinion    11.5 %    40 %      28.6 %     13.3 %   24.8%
 Disagree     11.5%      20%       60%       66.7%    38.6%
              _____     ______    ______     ______   _____
   Total      99.9%     100%       100%      100%     100%
    (N)        (26)      (25)      (35)       (15)      (101)
  Missing      =8
  cases
ROW PERCENTAGED TABLE
AGE GROUP BY ATTITUDE ABOUT CHANGING THE DRINKING AGE

                                 Age group
Attitude   Under 30   30-45    46-60   61 and   Total    (N)
                                        older
 Agree      54.1%      27%     10.8%    8.1%    100     (37)
  No         12%       40%     40%      8%      100     (25)
opinion
Disagree    7.7%      12.8%    53.8%   25.6%    99.9    (39)
            _____     ______   _____   _____    _____
 Total      25.7%     24.8%    34.7%   14.9%    100.1   (101)
Missing      =8
cases
RESULTS WITH TWO VARIABLES
   Three ways to percentage a table:
     By row
     By column
     For the total

   First two are most often used; last rare

Which is best?
 Either can be appropriate.
RESULTS WITH TWO VARIABLES
   MEASURES OF ASSOCIATION  single number that
    expresses the strength, and often the direction, of a
    relationship; condenses information about a bivariate
    relationship into a single number

   Many measures are called by letters of the greek
    alphabet (lambda, gamma, tau, chi (squared), and rho).
    The emphasis is on interpreting the measures, not on
    their calculation.
SUMMARY OF MEASURES OF ASSOCIATION
 Measure       Greek   Type of data         High       Independence
              symbol                     association
 Lambda         λ        Nominal             1.0            0
 Gamma          γ         Ordinal         +1.0, -1.0        0
   Tau          τ         Ordinal         +1.0, -1.0        0
(Kendall’s)
   Rho          ρ      Interval, ratio    +1.0, -1.0        0

Chi-squared     χ2       Nominal,          Infinity         0
                          ordinal
MORE THAN TWO VARIABLES (MULTIVARIATE)
 Trivariate percentaged tables
 Multiple regression analysis
MORE THAN TWO VARIABLES
 STATISTICAL CONTROL  control variables
 Researcher controls for alternative explanations in
  multivariate analysis by introducing a third variable.
  Example: Bavariate table showing that taller teens like
  baseball more than shorter ones
 May be spurious since male teens are taller than
  females, and males tend to like baseball more than
  females.
 To test whether the relationship is due to sex, a
  researcher must control for gender
MORE THAN TWO VARIABLES
 TRIAVARIATE PERCENTAGED TABLES
 Trivariate tables has a bivariate table of the independent
  and dependent variable for each category of the control
  variable.  partials
 Number of partials depends on the number of categories
  in the control variable.
 Trivariate tables have 3 limitations:
     Difficult to interpret if a control variable has more than 4
      categories.
     Control variables can be at any level of measurement, but
      interval or ratio control variables must be grouped, and how
      cases are grouped can affect the interpretations of effects.
     Total number of cases is a limiting factor because the cases
      are divided among cells in partial.
MORE THAN TWO VARIABLES
P Number of cells in the partials
 C number of cells in the bivariate relationship
 N number of categories in the control variable
                    P=CxN
Example: Control variable having 3 categories, and a
 bivariate table having 12 cells.
                    P= 12 x 3 = 36 cells
An average of 5 cases per cell is recommended, so the
 researcher will need 5 x 6 = 180 cases at minimum
COMPOUND FREQUENCY DISTRIBUTION FOR TRIVARIATE TABLE
Males                                       Females
Age              Attitude    No. of cases   Age              Attitude    No. of cases
Under 30         Agree       10             Under 30         Agree       10
Under 30         No option   1              Under 30         No option   2
Under 30         Disagree    2              Under 30         Disagree    1
30-45            Agree       5              30-45            Agree       5
30-45            No option   5              30-45            No option   5
30-45            Disagree    2              30-45            Disagree    3
46-60            Agree       2              46-60            Agree       2
46-60            No option   5              46-60            No option   5
46-60            Disagree    11             46-60            Disagree    10
61 and older     Agree       3              61 and older     Agree       0
61 and older     No option   0              61 and older     No option   2
61 and older     Disagree    5____          61 and older     Disagree    5______
                 subtotal    51                              subtotal    50
Missing on either variable   4____          Missing on either variable   4______
No. of males                 55             No. of females               54
PARTIAL TABLE FOR MALES
Attitude    Under 30    30-45    46-60   61 and older   Total
 Agree        10         5         2     3               20
No option      1         5         5     0               11
Disagree       2         2         11    5               20
  Total       13         12        18    8               51
Missing       =4
cases


                   PARTIAL TABLE FOR FEMALES
Attitude    Under 30    30-45    46-60   61 and older   Total
 Agree        10         5         2     0               17
No option      2         5         5     2               14
Disagree       1         3         10    5               19
  Total       13         13        17    7
Missing       =4
cases
MORE THAN TWO VARIABLES
 Elaboration paradigm  system for reading percentaged
  trivariate tables
   describes the pattern that emerges when a control
  variable is introduced
 Four terms describe how the partial tables compare to
  the initial bivariate table, or how the original bivariate
  relationship changes after the control variable is
  considered.
SUMMARY OF THE ELABORATION PARADIGM
Pattern name          Pattern seen when comparing partials to the
                      original bivariate table
Replication           Same relationship in both partials as in bivariate table
Specification         Bivariate relationship seen in one of the partial tables
Interpretation        Bivariate relationship weakens greatly or disappears in
                      the partial tables (control variable intervening)
Explanation           Bivariate relationship weakens greatly or disappears in
                      the partial tables (control variable before independent)
Suppressor variable   No bivariate relationship, relationship appears in partial
                      tables only
EXAMPLES OF ELABORATION PATTERNS

                                REPLICATION
       Bivariate table                         Partials
                                        Control = low     Control = high
            Low          High          Low       High     Low      High
Low         85%          15%    Low    84%       16%      86%      14%
High        15%          85%    High   16$       84%      14%      86%
                    INTERPRETATION OR EXPLANATION
       Bivariate table                         Partials
                                        Control = low     Control = high
            Low          High          Low       High     Low      High
Low         85%          15%    Low    45%       55%      55%      45%
High        15%          85%    High   55%       45%      45%      55%
EXAMPLES OF ELABORATION PATTERNS
                                SPECIFICATION
       Bivariate table                          Partials
                                         Control = low     Control = high
            Low          High           Low       High     Low      High
 Low        85%          15%     Low    95%       5%       50%      50%
High        15%          85%    High     5%       95%      50%      50%
                            SUPPRESSOR VARIABLE
       Bivariate table                          Partials
                                         Control = low     Control = high
            Low          High           Low       High     Low      High
 Low        54%          46%     Low    84%       16%      14%      86%
High        46%          54%    High    16%       84%      86%      14%
MORE THAN TWO VARIABLES
 Replication pattern  easiest to understand; when the
  partials replicate or reproduce the same relationship that
  existed in the bivariate table before considering the
  control variable
   control variable has no effect
 Specification pattern  occurs when one partial replicates
  the initial bivariate relationship, but other partials do not.
   researcher can specify the category of the control
  variable in which the initial relationship persists
Example: college grades and automobiles
MORE THAN TWO VARIABLES
 Interpretation pattern  describes the situation in which
  the control variable intervenes between the original
  independent and dependent variable.
 Explanation pattern  looks the same as interpretation;
  difference is the temporal order of the control variable
   control variable comes before the independent
  variable in the initial bivariate relationship
Suppressor variable pattern  occurs when the bivariate
  tables suggest independence but a relationship appears
  in one or both of the partials
MORE THAN TWO VARIABLES
 MULTIPLE REGRESSION  statistical technique which
  is quickly computed by an appropriate statistics software
 Regression results measure the direction and size of the
  effect of each variable on a dependent variable. The
  effect is measured precisely and given a numerical
  value.
 The effect on the dependent variable is measured by a
  standardized coefficient or the Greek letter beta (β). It is
  equal to the ρ correlation coefficient.
MORE THAN TWO VARIABLES
   Researchers use the beta regression coefficient to
    determine whether control variables have an effect.

Example: the bivariate correlation between X and Y is
 0.75. Researcher statistically considers four control
 variables. If the beta remains at 0.75, then the four
 control variables have no effect. However, if the beta for
 X and Y gets smaller, it indicates that the control
 variables have an effect.
INFERENTIAL STATISTICS
 use probability theory to test hypotheses formally, permit
  inferences from a sample to a population, and test
  whether descriptive results are likely to be due to random
  factors or to a real relationship.
 are a more powerful type of statistics than descriptive
  statistics
 rely on principles from probability sampling, where a
  researcher uses a random process to select a subset of
  cases from the entire population.
 used when researchers conduct various statistical test
  (e.g. t-test or an F-test)
INFERENTIAL STATISTICS
   Statistical significance  results are not likely to be due
    to chance factors
     indicates the probability of finding a relationship in the
    sample where there is none in the population
     uses probability theory and specific statistical tests to
    tell a researcher whether the results are likely to be
    produced by random error in random sampling
INFERENTIAL STATISTICS
 Levels of significance (usually .05, .01, or .001) is a way
  of talking about the likelihood that the results are due to
  chance factors –that is, a relationship appears in the
  sample when there is none in the population.
 If a researcher says that results are significant at the .05
  level, this means:
     Results like these are due to a chance factors only 5 in 100
      times
     There is a 95% chance that the sample results are not due to
      chance factors alone, but reflect the population accurately
     The odds of such results based on chance alone are .05 or 5%
     One can be 95% confident that the results are due to a real
      relationship in the population, not chance factors
INFERENTIAL STATISTICS
  Type I error  occurs when the researcher says that a
    relationship exists when in fact none exists
     falsely rejecting a null hypothesis
  Type II error  occurs when a researcher says that a
    relationship does not exist when in fact it does.
     falsely accepting a null hypothesis

                           True situation in the world
What the researcher says   No relationship          Causal relationship
No relationship            No error                 Type II error
Causal relationship        Type I error             No error

Más contenido relacionado

La actualidad más candente

Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
Damian T. Gordon
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysis
sristi1992
 
Malimu descriptive statistics.
Malimu descriptive statistics.Malimu descriptive statistics.
Malimu descriptive statistics.
Miharbi Ignasm
 

La actualidad más candente (18)

Statistics in research
Statistics in researchStatistics in research
Statistics in research
 
descriptive data analysis
 descriptive data analysis descriptive data analysis
descriptive data analysis
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)
 
Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysis
 
Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Repeated-Measures and Two-Factor Analysis of Variance
Repeated-Measures and Two-Factor Analysis of VarianceRepeated-Measures and Two-Factor Analysis of Variance
Repeated-Measures and Two-Factor Analysis of Variance
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Malimu descriptive statistics.
Malimu descriptive statistics.Malimu descriptive statistics.
Malimu descriptive statistics.
 
Data Analysis: Descriptive Statistics
Data Analysis: Descriptive StatisticsData Analysis: Descriptive Statistics
Data Analysis: Descriptive Statistics
 
Tools and Techniques - Statistics: descriptive statistics
Tools and Techniques - Statistics: descriptive statistics Tools and Techniques - Statistics: descriptive statistics
Tools and Techniques - Statistics: descriptive statistics
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Description
 
Central tendency
Central tendencyCentral tendency
Central tendency
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 

Destacado (13)

Summary of chapter 9 & 10 dmavd
Summary of chapter 9 & 10 dmavdSummary of chapter 9 & 10 dmavd
Summary of chapter 9 & 10 dmavd
 
Descriptivestatistics
DescriptivestatisticsDescriptivestatistics
Descriptivestatistics
 
Language learning strategies : Rebecca L. Oxford
Language learning strategies : Rebecca L. OxfordLanguage learning strategies : Rebecca L. Oxford
Language learning strategies : Rebecca L. Oxford
 
PDU 211 Research Methods: Analyzing & Interpreting Quantitative Data
PDU 211 Research Methods: Analyzing & Interpreting Quantitative DataPDU 211 Research Methods: Analyzing & Interpreting Quantitative Data
PDU 211 Research Methods: Analyzing & Interpreting Quantitative Data
 
Week 9 validity and reliability
Week 9 validity and reliabilityWeek 9 validity and reliability
Week 9 validity and reliability
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Teaching Grammar
Teaching GrammarTeaching Grammar
Teaching Grammar
 
Theories of Teaching in TEFL
Theories of Teaching in TEFL Theories of Teaching in TEFL
Theories of Teaching in TEFL
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
 
Language Teaching Approaches and Methods
Language Teaching Approaches and MethodsLanguage Teaching Approaches and Methods
Language Teaching Approaches and Methods
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
Quantitative And Qualitative Research
Quantitative And Qualitative ResearchQuantitative And Qualitative Research
Quantitative And Qualitative Research
 
Qualitative and quantitative methods of research
Qualitative and quantitative methods of researchQualitative and quantitative methods of research
Qualitative and quantitative methods of research
 

Similar a Analyzing quantitative data

B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
marshalkalra
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
leblance
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
Gilbert Joseph Abueg
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Burak Mızrak
 

Similar a Analyzing quantitative data (20)

B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
Ders 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptxDers 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptx
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
Kinds Of Variable
Kinds Of VariableKinds Of Variable
Kinds Of Variable
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
2-Descriptive statistics.pptx
2-Descriptive statistics.pptx2-Descriptive statistics.pptx
2-Descriptive statistics.pptx
 
1.1 STATISTICS
1.1 STATISTICS1.1 STATISTICS
1.1 STATISTICS
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Statistical treatment and data processing copy
Statistical treatment and data processing   copyStatistical treatment and data processing   copy
Statistical treatment and data processing copy
 
5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt
 
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion
 
Types of variables and descriptive statistics
Types of variables and descriptive statisticsTypes of variables and descriptive statistics
Types of variables and descriptive statistics
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Analyzing quantitative data

  • 1. ANALYZING QUANTITATIVE DATA Presented by: Jean Marie Villamor
  • 2. DEALING WITH DATA CODING DATA systematically reorganizing raw data into a format that is machine-readable (i.e. easy to analyze using computers)  Coding can be a clerical task when the data are recorded as numbers on well-organized recording sheets, but it is very difficult when, for example, a researcher wants to code answers to open-ended survey questions into numbers in a process similar to latent content analysis.
  • 3. DEALING WITH DATA  Researchers uses coding system and a codebook for data coding:  Coding system  set of rules stating that certain numbers are assigned to variable attributes. example: a researcher codes males as 1 and females as 2.  Codebook  a document describing the coding system and the location of data for variables in a format that computers can use.
  • 4. DEALING WITH DATA  Precoding  placing the code categories on the questionnaire.  If a researcher does not precode, his first step after collecting data is to create a codebook. He also gives each case an identification number to keep track of the cases. Next he transfers the information from each questionnaire into a format that computers can read.
  • 5. ENTERING DATA  Most computer programs designed for data analysis need the data in a grid format.  In the grid, each row represents a respondent, subject, or case  DATA RECORDS because each is a record of data for a single case  Column or sets of columns represents specific variables.
  • 6. CLEANING DATA  Accuracy is extremely important when coding data.  Errors made in coding can cause misleading results.  Highly recommended to recheck all coding.  Coding can be verified in two ways:  Possible code cleaning (wild code checking)  involves checking the categories of all variables for impossible codes  Contingency cleaning (consistency checking)  involves cross-classifying two variables and looking for logically impossible combinations.
  • 7. COMPUTERS AND SOCIAL RESEARCH  Researchers use computers to perform specialized tasks more efficiently and effectively. Example: organize data, calculate statistics, write reports
  • 8. COMPUTERS AND SOCIAL RESEARCH  Computer began as mechanical devices for sorting cards that had holes punched into them.  Each hole were punched in specific locations and represented information about a variable.  These machines organized vast amounts of information more quickly, reliably, and efficiently than the paper-and- pencil methods.  IBM CARDS  thin cardboard cards that had 80 columns and 12 rows, or 960 spaces for information.
  • 9. COMPUTERS AND SOCIAL RESEARCH  COMPUTER TERMINAL is a simple typewriter-like device connected to a mainframe computer, which people use to type data and instructions directly into the computer.  MICROCOMPUTERS have replaced mainframes for many tasks, to more people, and have stimulated new uses form computers.  Three basic parts:  Monitor (CRT or VDT)  Keyboard  CPU (Central Processing Unit)
  • 10. COMPUTERS AND SOCIAL RESEARCH  Information can get into a microcomputer in 4 ways:  Some is built into the computer memory itself  User can type it on the keyboard  It can come across a telephone line and into the computer through a modem  It can be stored on floppy disks, or data travellers which computers can read.
  • 11. HOW COMPUTERS HELP THE RESEARCHER  Purpose of the computer: to write reports, to organize large amounts of data, and to compute statistical measures.  Computers help researchers perform tasks faster and with greater accuracy than by hand.  Without the appropriate computer, a researcher cannot analyze data from a large-scale research project or calculate complicated statistics.  SPSS statistical package for the social sciences is a statistical software most social researchers use that is specifically designed for analyzing quantitative social science data.
  • 12. STATISTICS  a branch of applied mathematics used to manipulate and summarize the features of numbers.  Descriptive statistics describe numerical data. They can be categorized by the number of variables involved:  Univariate  Bivariate  Multivariate
  • 13. RESULTS WITH ONE VARIABLE (UNIVARIATE)  Frequency Distributions  Measures of Central Tendency  Measures of Variation
  • 14. RESULTS WITH ONE VARIABLE  Univariate describes one variable. The easiest way to describe the numerical data of one variable is through frequency distribution. FREQUENCY DISTRIBUTION  can be used with nominal-, ordinal-, interval-, or ratio-level data and takes many forms. Example: A data having 400 respondents can be summarized on the gender of the respondents with a raw count or a percentage frequency distribution.
  • 15. GENDER COUNT OF RESPONDENTS RAW COUNT FREQUENCY PERCENTAGE FREQUENCY DISTRIBUTION Distribution Gender Frequency Gender Frequency Male 100 Male 25% Female 300 Female 75% Total 400 Total 100% BAR CHART Females Column2 Column1 Males frequency distribution 0 20 40 60 80
  • 16. RESULTS WITH ONE VARIABLE  Common types of graphic representations:  Histogram  Bar chart  Pie chart  For interval- or ratio-level data  researcher groups the information into categories; grouped categories should be mutually exclusive  Interval- or ratio-level data are often plotted in a frequency polygon.
  • 17. RESULTS WITH ONE VARIABLE  MEASURES OF CENTRAL TENDENCY  Mean  arithmetic average  most widely used measure of central tendency  can be used only with interval- or ratio-level data.  Median middle point  50th percentile, or the point at which half the cases are above it and half below it  can be used with ordinal-, interval-, or ratio-level data (not nominal-level)  Mode easiest to use and can be used with nominal, ordinal , interval, or ratio data  most common or frequently occurring number
  • 18. RESULTS WITH ONE VARIABLE  If the frequency distribution forms a NORMAL or BELL- SHAPED CURVE, the three measures of central tendency equal each other number of cases mean , median, mode Lowest values of variables highest
  • 19. RESULTS WITH ONE VARIABLE  SKEWED DISTRIBUTION  measures of central tendency are not equal mode median mean mode median mean
  • 20. RESULTS WITH ONE VARIABLE  If most cases have lower scores with a few extreme high scores, the mean will be the highest, the median in the middle, and the mode the lowest.  If most cases have higher scores with a few extreme low scores, the mean will be the lowest, the median in the middle, and the mode the highest.  In general, the median is best for skewed distributions, although the mean is used in most other statistics.
  • 21. RESULTS WITH ONE VARIABLE  MEASURES OF VARIATION  one- number summary of a distribution  Three basic ways:  Range  Percentile  Standard deviation
  • 22. RESULTS WITH ONE VARIABLE  RANGE  simplest  largest and smallest score  for ordinal-, interval-, and ratio-level  PERCENTILES  tell the score at a specific place within the distribution  for ordinal-, interval-, and ratio-level  STANDARD DEVIATION  most difficult to compute measure of dispersion; yet is also the most comprehensive and widely used.  interval or ratio level  based on the mean and gives an “average distance” between all scores and the mean
  • 23. RESULTS WITH ONE VARIABLE STEPS IN COMPUTING THE STANDARD DEVIATION 1. Compute the mean. 2. Subtract the mean from each score. 3. Square the resulting difference for each score. 4. Total up the squared differences to get the sum of squares. 5. Divide the sum of squares by the number of cases to get the variance. 6. Take the square root of the variance, which is the standard deviation.
  • 24. RESULTS WITH ONE VARIABLE  Standard deviation and the mean are used to create z- scores.  Z-scores  let a researcher compare two or more distribution or groups.  also called a standardized score, expresses points or scores on a frequency distribution in terms of a number of standard deviations from the mean. Scores are in terms of their relative position within a distribution, not as absolute values.
  • 25. EXAMPLE OF COMPUTING THE STANDARD DEVIATION [8 RESPONDENTS, VARIABLE = YEARS OF SCHOOLING] score score- mean squared (score-mean) 15 2.5 6.25 12 -0.5 .25 12 -0.5 .25 10 -2.5 6.25 16 3.5 12.25 18 5.5 30.25 8 -4.5 20.25 9 -3.5 12.25 Mean = 12.5 sum of squares = 88 Variance = sum of squares = 88 = 11 no. of cases 8 Standard deviation = square root of variance = √11 = 3.317 years
  • 26. Symbols: Formula: X = score of case Standard deviation √ ∑(x-x )2 ∑ = sigma for sum N N = number of cases X = mean
  • 27. RESULTS WITH TWO VARIABLES (BIVARIATE)  The Scattergram  Cross-tabulation/ Percentaged Table  Measures of Association
  • 28. RESULTS WITH TWO VARIABLES  Bavariate statistics  let a researcher consider two variables together and describe the relationship between variables  Statistical relationships are based on two ideas:  Covariation  things that are associated Example: life expectancy and income  Independence  no association or relationship between two variables Example: number of siblings and life expectancy
  • 29. RESULTS WITH TWO VARIABLES  SCATTERGRAM  a graph on which a researcher plots each case or observation, where each axis represents the value of one variable.  used for interval- or ratio-level data; rarely for ordinal; never for nominal independent variable = horizontal axis (x) dependent variable = vertical axis (y)
  • 30. RESULTS WITH TWO VARIABLES  Three aspects of a bivariate relationship in a scattergram:  Form  Independence  random scatter with no pattern, or a straight line that is parallel to one of the axis  Linear  straight line that can be visualized in the middle of a maze of cases running from one corner to another  Curvilinear centre of a maze of cases could form a U curve, right side up or upside down, or an S curve  Direction  Positive and negative  Precision  Amount of spread in the points on the graph  High level occurs when the points hug the line that summarizes the relationships  Low level occurs when the points are widely spread around the line
  • 31. RESULTS WITH TWO VARIABLES  PERCENTAGED TABLES  presents the same information as a scattergram in a more condensed form  Cross-tabulation  cases are organized in the table on the basis of two variables at the same time.  Bavariate tables usually contain percentages.
  • 32. RESULTS WITH TWO VARIABLES CONSTRUCTING PERCENTAGE TABLES 1. Raw data 2. Compound frequency distribution (CFD)  Figure all possible combinations of variable categories  Make a mark next to the combination category into which each case falls  Add up the marks for the number of cases in a combination category 3. Set up the parts of a table (labeling rows and columns) 4. Each number from the CFD is placed in a cell in the table that corresponds to the combination of variable categories.
  • 33. THE PARTS OF A TABLE  Title  Label row and column variable and give a name to each of the variable categories.  Marginals  totals of the columns and rows  Body of the table  Cell of a table  If there is missing information (cases in which a respondent refused to answer, ended interview, said “I don’t know”, etc.), report the number of missing cases near the table to account for all original cases  For percentaged tables, provide the number of cases or N on which percentages are computed in parentheses near the total of 100%. This makes it possible to go back and forth from a percentaged table to a raw count table and vice versa.
  • 34. RAW COUNT TABLE AGE GROUP BY ATTITUDE ABOUT CHANGING THE DRINKING AGE Age group Attitude Under 30 30-45 46-60 61 and total older Agree 20 10 4 3 37 No opinion 3 (4) 10 10 2 25 Disagree 3 5 21 10 39 _____ _____ _____ _____ _____ Total 26 25 35 15 101 Missing =8 cases (6)
  • 35. COLUMN PERCENTAGED TABLE AGE GROUP BY ATTITUDE ABOUT CHANGING THE DRINKING AGE Age group Attitude Under 30 30-45 46-60 61 and Total older Agree 76.9 % 40% 11.14% 20% 36.6% No opinion 11.5 % 40 % 28.6 % 13.3 % 24.8% Disagree 11.5% 20% 60% 66.7% 38.6% _____ ______ ______ ______ _____ Total 99.9% 100% 100% 100% 100% (N) (26) (25) (35) (15) (101) Missing =8 cases
  • 36. ROW PERCENTAGED TABLE AGE GROUP BY ATTITUDE ABOUT CHANGING THE DRINKING AGE Age group Attitude Under 30 30-45 46-60 61 and Total (N) older Agree 54.1% 27% 10.8% 8.1% 100 (37) No 12% 40% 40% 8% 100 (25) opinion Disagree 7.7% 12.8% 53.8% 25.6% 99.9 (39) _____ ______ _____ _____ _____ Total 25.7% 24.8% 34.7% 14.9% 100.1 (101) Missing =8 cases
  • 37. RESULTS WITH TWO VARIABLES  Three ways to percentage a table:  By row  By column  For the total  First two are most often used; last rare Which is best?  Either can be appropriate.
  • 38. RESULTS WITH TWO VARIABLES  MEASURES OF ASSOCIATION  single number that expresses the strength, and often the direction, of a relationship; condenses information about a bivariate relationship into a single number  Many measures are called by letters of the greek alphabet (lambda, gamma, tau, chi (squared), and rho). The emphasis is on interpreting the measures, not on their calculation.
  • 39. SUMMARY OF MEASURES OF ASSOCIATION Measure Greek Type of data High Independence symbol association Lambda λ Nominal 1.0 0 Gamma γ Ordinal +1.0, -1.0 0 Tau τ Ordinal +1.0, -1.0 0 (Kendall’s) Rho ρ Interval, ratio +1.0, -1.0 0 Chi-squared χ2 Nominal, Infinity 0 ordinal
  • 40. MORE THAN TWO VARIABLES (MULTIVARIATE)  Trivariate percentaged tables  Multiple regression analysis
  • 41. MORE THAN TWO VARIABLES  STATISTICAL CONTROL  control variables  Researcher controls for alternative explanations in multivariate analysis by introducing a third variable. Example: Bavariate table showing that taller teens like baseball more than shorter ones  May be spurious since male teens are taller than females, and males tend to like baseball more than females.  To test whether the relationship is due to sex, a researcher must control for gender
  • 42. MORE THAN TWO VARIABLES  TRIAVARIATE PERCENTAGED TABLES  Trivariate tables has a bivariate table of the independent and dependent variable for each category of the control variable.  partials  Number of partials depends on the number of categories in the control variable.  Trivariate tables have 3 limitations:  Difficult to interpret if a control variable has more than 4 categories.  Control variables can be at any level of measurement, but interval or ratio control variables must be grouped, and how cases are grouped can affect the interpretations of effects.  Total number of cases is a limiting factor because the cases are divided among cells in partial.
  • 43. MORE THAN TWO VARIABLES P Number of cells in the partials C number of cells in the bivariate relationship N number of categories in the control variable P=CxN Example: Control variable having 3 categories, and a bivariate table having 12 cells. P= 12 x 3 = 36 cells An average of 5 cases per cell is recommended, so the researcher will need 5 x 6 = 180 cases at minimum
  • 44. COMPOUND FREQUENCY DISTRIBUTION FOR TRIVARIATE TABLE Males Females Age Attitude No. of cases Age Attitude No. of cases Under 30 Agree 10 Under 30 Agree 10 Under 30 No option 1 Under 30 No option 2 Under 30 Disagree 2 Under 30 Disagree 1 30-45 Agree 5 30-45 Agree 5 30-45 No option 5 30-45 No option 5 30-45 Disagree 2 30-45 Disagree 3 46-60 Agree 2 46-60 Agree 2 46-60 No option 5 46-60 No option 5 46-60 Disagree 11 46-60 Disagree 10 61 and older Agree 3 61 and older Agree 0 61 and older No option 0 61 and older No option 2 61 and older Disagree 5____ 61 and older Disagree 5______ subtotal 51 subtotal 50 Missing on either variable 4____ Missing on either variable 4______ No. of males 55 No. of females 54
  • 45. PARTIAL TABLE FOR MALES Attitude Under 30 30-45 46-60 61 and older Total Agree 10 5 2 3 20 No option 1 5 5 0 11 Disagree 2 2 11 5 20 Total 13 12 18 8 51 Missing =4 cases PARTIAL TABLE FOR FEMALES Attitude Under 30 30-45 46-60 61 and older Total Agree 10 5 2 0 17 No option 2 5 5 2 14 Disagree 1 3 10 5 19 Total 13 13 17 7 Missing =4 cases
  • 46. MORE THAN TWO VARIABLES  Elaboration paradigm  system for reading percentaged trivariate tables  describes the pattern that emerges when a control variable is introduced  Four terms describe how the partial tables compare to the initial bivariate table, or how the original bivariate relationship changes after the control variable is considered.
  • 47. SUMMARY OF THE ELABORATION PARADIGM Pattern name Pattern seen when comparing partials to the original bivariate table Replication Same relationship in both partials as in bivariate table Specification Bivariate relationship seen in one of the partial tables Interpretation Bivariate relationship weakens greatly or disappears in the partial tables (control variable intervening) Explanation Bivariate relationship weakens greatly or disappears in the partial tables (control variable before independent) Suppressor variable No bivariate relationship, relationship appears in partial tables only
  • 48. EXAMPLES OF ELABORATION PATTERNS REPLICATION Bivariate table Partials Control = low Control = high Low High Low High Low High Low 85% 15% Low 84% 16% 86% 14% High 15% 85% High 16$ 84% 14% 86% INTERPRETATION OR EXPLANATION Bivariate table Partials Control = low Control = high Low High Low High Low High Low 85% 15% Low 45% 55% 55% 45% High 15% 85% High 55% 45% 45% 55%
  • 49. EXAMPLES OF ELABORATION PATTERNS SPECIFICATION Bivariate table Partials Control = low Control = high Low High Low High Low High Low 85% 15% Low 95% 5% 50% 50% High 15% 85% High 5% 95% 50% 50% SUPPRESSOR VARIABLE Bivariate table Partials Control = low Control = high Low High Low High Low High Low 54% 46% Low 84% 16% 14% 86% High 46% 54% High 16% 84% 86% 14%
  • 50. MORE THAN TWO VARIABLES  Replication pattern  easiest to understand; when the partials replicate or reproduce the same relationship that existed in the bivariate table before considering the control variable  control variable has no effect  Specification pattern  occurs when one partial replicates the initial bivariate relationship, but other partials do not.  researcher can specify the category of the control variable in which the initial relationship persists Example: college grades and automobiles
  • 51. MORE THAN TWO VARIABLES  Interpretation pattern  describes the situation in which the control variable intervenes between the original independent and dependent variable.  Explanation pattern  looks the same as interpretation; difference is the temporal order of the control variable  control variable comes before the independent variable in the initial bivariate relationship Suppressor variable pattern  occurs when the bivariate tables suggest independence but a relationship appears in one or both of the partials
  • 52. MORE THAN TWO VARIABLES  MULTIPLE REGRESSION  statistical technique which is quickly computed by an appropriate statistics software  Regression results measure the direction and size of the effect of each variable on a dependent variable. The effect is measured precisely and given a numerical value.  The effect on the dependent variable is measured by a standardized coefficient or the Greek letter beta (β). It is equal to the ρ correlation coefficient.
  • 53. MORE THAN TWO VARIABLES  Researchers use the beta regression coefficient to determine whether control variables have an effect. Example: the bivariate correlation between X and Y is 0.75. Researcher statistically considers four control variables. If the beta remains at 0.75, then the four control variables have no effect. However, if the beta for X and Y gets smaller, it indicates that the control variables have an effect.
  • 54. INFERENTIAL STATISTICS  use probability theory to test hypotheses formally, permit inferences from a sample to a population, and test whether descriptive results are likely to be due to random factors or to a real relationship.  are a more powerful type of statistics than descriptive statistics  rely on principles from probability sampling, where a researcher uses a random process to select a subset of cases from the entire population.  used when researchers conduct various statistical test (e.g. t-test or an F-test)
  • 55. INFERENTIAL STATISTICS  Statistical significance  results are not likely to be due to chance factors  indicates the probability of finding a relationship in the sample where there is none in the population  uses probability theory and specific statistical tests to tell a researcher whether the results are likely to be produced by random error in random sampling
  • 56. INFERENTIAL STATISTICS  Levels of significance (usually .05, .01, or .001) is a way of talking about the likelihood that the results are due to chance factors –that is, a relationship appears in the sample when there is none in the population.  If a researcher says that results are significant at the .05 level, this means:  Results like these are due to a chance factors only 5 in 100 times  There is a 95% chance that the sample results are not due to chance factors alone, but reflect the population accurately  The odds of such results based on chance alone are .05 or 5%  One can be 95% confident that the results are due to a real relationship in the population, not chance factors
  • 57. INFERENTIAL STATISTICS Type I error  occurs when the researcher says that a relationship exists when in fact none exists  falsely rejecting a null hypothesis Type II error  occurs when a researcher says that a relationship does not exist when in fact it does.  falsely accepting a null hypothesis True situation in the world What the researcher says No relationship Causal relationship No relationship No error Type II error Causal relationship Type I error No error