SlideShare una empresa de Scribd logo
1 de 52
Descargar para leer sin conexión
VALIDITY AND
RELIABILITY
SEFA SONER BAYRAKTAR
MEHMET AKIF ERSOY UNIVERSITY
1930461005
bayraktarss@hotmail.com
Hypotheses
A hypothesis is a type of prediction found in many experimental studies; it is a
statement about what we expect to happen in a study.
In research reports there are generally two types of hypotheses: research
hypotheses and null hypotheses.
The null hypothesis (often written as H0) is a neutral statement used as a basis
for testing.
The null hypothesis states that there is no relationship between items under
investigation.
The statistical task is to
reject the null hypothesis
and to show that there is a
relationship between X and
Y.
Based on previous research
reports in the literature, we
expect a particular
outcome, we can form
research hypotheses.
The first is to predict that there
will be a difference between
two groups, although we do
not have sufficient information
to predict the direction of the
difference.
This is known as a
nondirectional or two-way
hypothesis.
On the other hand, we may have
enough information to predict a
difference in one direction or
another.
This is called a directional
or one-way hypothesis
VARIABLE
TYPES
In order to carry out any sort of
measurement, we need to think about
variables; that is, characteristics that
vary from person to person, text to
text, or object to object.
Simply put, variables are features or
qualities that change.
INDEPENDENT AND DEPENDENT VARIABLES
There are two main
variable types:
independent and
dependent.
The independent
variable is the one that
we believe may "cause"
the results; the
dependent variable is
the one we measure to
see the effects the
independent variable
has on it.
MODERATOR
VARIABLES
• Moderator variables are
characteristics of
individuals or of treatment
variables that may result in
an interaction between an
independent variable and
other variables
INTERVENING VARIABLES
• Intervening variables are similar to
moderator variables, but they are not
included in an original study either
because the researcher has not
considered the possibility of their
effect or because they cannot be
identified in a precise way.
CONTROL
VARIABLES
• When conducting research, one
ideally wants to study simply the
effects of the independent
variable on a dependent variable.
• Variables that might interfere with the findings
include the possibility that learners with different
levels of proficiency respond differently to
different types of feedback.
OPERATIONALIZATION
Once a variable has been
operationalized in a manner such as
this, it is possible to use it in
measurements.
An operational definition allows
researchers to operate, or work, with
the variables. Operationalizations
allow measurement.
MEASURING VARIABLES: SCALES OF
MEASUREMENT
• The three commonly used
scales are;
1-Nominal
2-Ordinal
3-Interval.
1-Nominal scales are used for attributes or categories
and allow researchers to categorize variables into two or
more groups. With nominal scales, different categories
can be assigned numerical values.
2-An ordinal scale is one in which ordering is implied.
For example, student test scores are often ordered from
best to worst or worst to best, with the result that there
is a 1st-ranked student, a 2nd-ranked student, a l0th-
ranked student, and so forth.
3-An interval scale represents the order of a
variable's values, but unlike an ordinal scale it also
reflects the interval or distance between points in
the ranking.
VALIDITY
• After spending a great deal of
time and effort designing a study,
we want to make sure that the
results of our study are valid.
• That is, we want them to reflect
what we believe they reflect and
that they are meaningful in the
sense that they have significance
not only to the population that
was tested, but, at least for most
experimental research, to a
broader, relevant population.
There are many types of validity;
Content validity
Face validity
Construct validity
criterion-related validity
predictive validity.
CONTENT VALIDITY
Content validity refers to the representativeness of our
measurement regarding the phenomenon about which
we want information.
If our test consists only of sentences which have some of
the examples of relative clauses.
Thus, our testing instrument is not sensitive to the full
range of relative clause types, and we can say that it
lacks content validity.
FACE VALIDITY
Face validity is closely related to the
notion of content validity and refers
to the familiarity of our instrument
and how easy it is to convince others
that there is content validity to it.
Learners are presented with
reasoning tasks to carry out in an
experiment and are already familiar
with these sorts of tasks because
they have carried them out in their
classrooms, we can say that the task
has face validity for the learners.
Construct Validity
This is perhaps the most complex of the validity types discussed so far.
Construct validity is an essential topic in second language acquisition research
precisely because many of the variables investigated are not easily or directly
defined.
In second language research, variables such as language proficiency, aptitude,
exposure to input, and linguistic representations are of interest.
However, these constructs are not directly measurable in
the way that height, weight, or age are.
In research, construct validity refers to the degree to which
the research adequately captures the construct of interest.
Construct validity can be enhanced when multiple
estimates of a construct are used.
Criterion-
Related
Validity
Criterion-related validity refers to the extent to
which tests used in a research study are
comparable to other well-established tests of
the construct in question.
One could measure the performance of a group
of students on the local test and a well-
established test there should be a good
correlation, one can then say that the local test
has been demonstrated to have criterion-related
validity.
Predictive
Validity
Predictive validity deals with the use
that one might eventually want to make
of a particular measure.
Considering the earlier example of a
local language test, if the test predicts
performance on some other dimension
(class grades), the test can be said to
have predictive validity.
Internal
Validity
Internal validity refers to the extent to
which the results of a study are a function
of the factor that the researcher intends
A researcher must control for (i.e., rule
out) all other possible factors that could
potentially account for the results.
It is important to think through a design
carefully to eliminate or at least minimize
threats to internal validity.
• There are many ways that internal validity
can be compromised, some of the most
common and important of which include
participant characteristics, participant
mortality (dropout rate), participant
inattention and attitude, participant
maturation, data collection (location and
collector), and instrumentation and test
effects.
Participant
Characteristics
Language Background.
Language Learning Experience.
Proficiency Level.
Participant Mortality
Participant Inattention
and Attitude
Participant Maturation
Data
Collection:
Location and
Collector
Not all research studies
will be affected by the
location of data collection,
but some might.
Two groups given the
same test might influence
the results if one group is
in a noisy or
uncomfortable setting and
the other is not.
Another factor in some
types of research relates to
the person doing the data
collection.
one could imagine different
results depending on
whether or not the
interviewer is a member of
the native culture or speaks
the native language.
Instrumentation
and Test Effects
• In this section we discuss three
factors that may affect internal
validity: equivalence between pre-
and posttests, giving the goal of
the study away, and test
instructions and questions.
Equivalence Between
Pre- and Posttests.
Giving the Goal of the
Study Away.
Instructions/Questions.
External
Validity
• With external validity, we are
concerned with the generalizability of
our findings, or in other words, the
extent to which the findings of the
study are relevant not only to the
research population, but also to the
wider population of language learners.
• It is important to remember that a
prerequisite of external validity is
internal validity
Sampling
Random Sampling. Random sampling
refers to the selection of participants
from the general population that the
sample will represent.
There are two common types of
random sampling: simple random (e.g.,
putting all names in a hat and drawing
from that pool) and stratified random
sampling (e.g., random sampling based
on categories).
Simple random sampling is
generally believed to be the best
way to obtain a sample that is
representative of the population,
especially as the sample size gets
larger.
The key to simple random
sampling is ensuring that each
and every member of a
population has an equal and
independent chance of being
selected for the research
STRATIFIED RANDOM
SAPMLING
Stratified random sampling
provides precision in terms of the
representativeness of the sample
and allows preselected
characteristics to be used as
variables.
In stratified random sampling, the
proportions of the subgroups in the
population are first determined, and
then participants are randomly
selected from within each stratum
according to the established
proportions.
Nonrandom
Sampling
Nonrandom sampling methods
are also common in second
language research.
Common nonrandom methods
include systematic, convenience,
and purposive sampling.
Systematic sampling is the choice of every
nth individual in a population list (where the
list should not be ordered systematically)
Convenience sampling is the selection of
individuals who happen to be available for
study.
In a purposive sample, researchers knowingly
select individuals based on their knowledge
of the population and in order to elicit data in
which they are interested.
Representativeness
and Generalizability
• If researchers want the
results of a particular study
to be generalizable, it is
incumbent upon them to
make an argument about
the representativeness of
the sample.
Fraenkel and Wallen (2003) provided the following minimum sample
numbers as a guideline:
100 for descriptive studies
50 for correlational studies
15 to 30 per group in experimental studies
depending on how tightly controlled they are.
• In second language studies, small groups are
sometimes appropriate as long as the techniques
for analysis take the numbers into account.
Collecting
Biodata
Information
• When reporting research, it is important to
include sufficient information to allow the
reader to determine the extent to which the
results of your study are indeed
generalizable to a new context. For this
reason, the collection of biodata information
is an integral part of one's database
• It is recommended that the researcher include
enough information for the study to be replicable
(American Psychological Association, 2001) and
for our purposes in this chapter enough
information for readers to determine
generalizability.
• The first is the privacy and anonymity of the
participants; the second is the need to report
sufficient data about the participants to allow
future researchers to both evaluate and replicate
the study.
• The Publication Manual of
the American Psychological
Association also suggested
that in reporting
information about
participants, selection and
assignment to treatment
groups also be included.
RELIABILITY
•Reliability in its simplest
definition refers to
consistency, often
meaning instrument
consistency.
Rater Reliability
The main defining characteristic of rater
reliability is that scores by two or more
raters or between one rater at Time X and
that same rater at Time Y are consistent.
In many instances, test scores are objective
and there is little judgment involved.
In many instances, test scores are objective
and there is little judgment involved.
We want to make sure that our definition of LREs (or whatever
construct we are dealing with) is sufficiently specific to allow any
researcher to identify them as such.
Interrater reliability begins with a well-defined construct.
It is a measure of whether two or more raters judge the same set
of data in the same way.
Instrument
Reliability
Not only do we have to make sure that
our raters are judging what they
believe they are judging in a consistent
manner, we also need to ensure that
our instrument is reliable.
In this section, we consider three types
of reliability testing: test-retest,
equivalence of forms of a test (e.g.,
pretest and posttest), and internal
consistency.
Test-Retest
In a test-retest method of determining
reliability, the same test is given to the
same group of individuals at two points in
time. One must carefully determine the
appropriate time interval between test
administrations.
In order to arrive at a score by which
reliability can be established, one
determines the correlation coefficient5
between the two test administrations.
Equivalence
of Forms
There are times when it is
necessary to determine the
equivalence of two tests, as, for
example, in a pretest and a
posttest.
In this method of determining
reliability, two versions of a test
are administered to the same
individuals and a correlation
coefficient is calculated.
Internal
Consistency
It is not always possible or feasible
to administer tests twice to the
same group of individuals (whether
the same test or two different
versions).
Nonetheless, when that is the case,
there are statistical methods to
determine reliability; split-half,
Kuder-Richardson 20 and 21, and
Cronbach's a are common ones. We
provide a brief description of each.
Split-half procedure is determined by obtaining a
correlation coefficient by comparing the performance on
half of a test with performance on the other half.
A statistical adjustment (Spearman-Brown prophecy
formula) is generally made to determine the reliability of
the test as a whole. If the correlation coefficient is high, it
suggests that there is internal consistency to the test.
Kuder-Richardson 20 and 21 are two approaches that are also used.
Although Kuder-Richardson 21 requires equal difficulty of the test
items, Kuder-Richardson 20 does not.
Both are calculated using information consisting of the number of
items, the mean, and the standard deviation. These are best used with
large numbers of items.
Cronbach's a is similar to the Kuder-
Richardson 20, but is used when the number
of possible answers is more than two.
Unlike Kuder-Richardson, Cronbach's a can
be applied to ordinal data.

Más contenido relacionado

La actualidad más candente

Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
Maheen Iftikhar
 

La actualidad más candente (20)

Experimental research design
Experimental research designExperimental research design
Experimental research design
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITY
 
Reliability types
Reliability typesReliability types
Reliability types
 
Validity of a Research Tool
Validity of a Research ToolValidity of a Research Tool
Validity of a Research Tool
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
 
Quantitative Data analysis
Quantitative Data analysisQuantitative Data analysis
Quantitative Data analysis
 
Variables in research
Variables in researchVariables in research
Variables in research
 
Experimental research
Experimental researchExperimental research
Experimental research
 
Reliability
ReliabilityReliability
Reliability
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Reliability
ReliabilityReliability
Reliability
 
SURVEY RESEARCH DESIGN
SURVEY RESEARCH DESIGNSURVEY RESEARCH DESIGN
SURVEY RESEARCH DESIGN
 
Hypothesis PPT
Hypothesis PPTHypothesis PPT
Hypothesis PPT
 
Assumptions about parametric and non parametric tests
Assumptions about parametric and non parametric testsAssumptions about parametric and non parametric tests
Assumptions about parametric and non parametric tests
 
Types of Variables
Types of VariablesTypes of Variables
Types of Variables
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences research
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types?
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Internal and external validity factors
Internal and external validity factorsInternal and external validity factors
Internal and external validity factors
 

Similar a Validity and reliability

·IntroductionQuantitative research methodology uses a dedu.docx
·IntroductionQuantitative research methodology uses a dedu.docx·IntroductionQuantitative research methodology uses a dedu.docx
·IntroductionQuantitative research methodology uses a dedu.docx
lanagore871
 
Quantitative reseach method
Quantitative reseach methodQuantitative reseach method
Quantitative reseach method
metalkid132
 
Slide presentation
Slide presentationSlide presentation
Slide presentation
Neza Mohd
 
Correlational and Survey Research
Correlational and Survey ResearchCorrelational and Survey Research
Correlational and Survey Research
zuraiberahim
 
Slide presentation
Slide presentationSlide presentation
Slide presentation
zuraiberahim
 
Correlational n survey research
Correlational n survey researchCorrelational n survey research
Correlational n survey research
Noor Hasmida
 

Similar a Validity and reliability (20)

ThDay 5 variables and measurement scales
ThDay 5 variables and measurement scalesThDay 5 variables and measurement scales
ThDay 5 variables and measurement scales
 
validity and reliability
validity and reliabilityvalidity and reliability
validity and reliability
 
Project desing
Project desingProject desing
Project desing
 
Reasrch methodology for MBA
Reasrch methodology for MBAReasrch methodology for MBA
Reasrch methodology for MBA
 
·IntroductionQuantitative research methodology uses a dedu.docx
·IntroductionQuantitative research methodology uses a dedu.docx·IntroductionQuantitative research methodology uses a dedu.docx
·IntroductionQuantitative research methodology uses a dedu.docx
 
Features and concepts of research design
Features and concepts of research designFeatures and concepts of research design
Features and concepts of research design
 
Summary chapters 1 5.english projectdesign.andreavallejo
Summary chapters 1 5.english projectdesign.andreavallejoSummary chapters 1 5.english projectdesign.andreavallejo
Summary chapters 1 5.english projectdesign.andreavallejo
 
Introduction to research
Introduction to researchIntroduction to research
Introduction to research
 
CHAPTER 2 RDL 2.pptx
CHAPTER 2 RDL 2.pptxCHAPTER 2 RDL 2.pptx
CHAPTER 2 RDL 2.pptx
 
Chapter 4 summary
Chapter 4 summaryChapter 4 summary
Chapter 4 summary
 
Quantitative reseach method
Quantitative reseach methodQuantitative reseach method
Quantitative reseach method
 
Slide presentation
Slide presentationSlide presentation
Slide presentation
 
Slide presentation
Slide presentationSlide presentation
Slide presentation
 
Correlational and Survey Research
Correlational and Survey ResearchCorrelational and Survey Research
Correlational and Survey Research
 
Slide presentation
Slide presentationSlide presentation
Slide presentation
 
Writing research papers
Writing research papersWriting research papers
Writing research papers
 
Correlational n survey research
Correlational n survey researchCorrelational n survey research
Correlational n survey research
 
Research Part II
Research Part IIResearch Part II
Research Part II
 
psychology-qualitative-vs-quantitative-powerpoint.pptx
psychology-qualitative-vs-quantitative-powerpoint.pptxpsychology-qualitative-vs-quantitative-powerpoint.pptx
psychology-qualitative-vs-quantitative-powerpoint.pptx
 
qualitative-vs-quantitative-ppt.pptx
qualitative-vs-quantitative-ppt.pptxqualitative-vs-quantitative-ppt.pptx
qualitative-vs-quantitative-ppt.pptx
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 

Último (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 

Validity and reliability

  • 1. VALIDITY AND RELIABILITY SEFA SONER BAYRAKTAR MEHMET AKIF ERSOY UNIVERSITY 1930461005 bayraktarss@hotmail.com
  • 2. Hypotheses A hypothesis is a type of prediction found in many experimental studies; it is a statement about what we expect to happen in a study. In research reports there are generally two types of hypotheses: research hypotheses and null hypotheses. The null hypothesis (often written as H0) is a neutral statement used as a basis for testing. The null hypothesis states that there is no relationship between items under investigation.
  • 3. The statistical task is to reject the null hypothesis and to show that there is a relationship between X and Y. Based on previous research reports in the literature, we expect a particular outcome, we can form research hypotheses.
  • 4. The first is to predict that there will be a difference between two groups, although we do not have sufficient information to predict the direction of the difference. This is known as a nondirectional or two-way hypothesis.
  • 5. On the other hand, we may have enough information to predict a difference in one direction or another. This is called a directional or one-way hypothesis
  • 6. VARIABLE TYPES In order to carry out any sort of measurement, we need to think about variables; that is, characteristics that vary from person to person, text to text, or object to object. Simply put, variables are features or qualities that change.
  • 7. INDEPENDENT AND DEPENDENT VARIABLES There are two main variable types: independent and dependent. The independent variable is the one that we believe may "cause" the results; the dependent variable is the one we measure to see the effects the independent variable has on it.
  • 8. MODERATOR VARIABLES • Moderator variables are characteristics of individuals or of treatment variables that may result in an interaction between an independent variable and other variables
  • 9. INTERVENING VARIABLES • Intervening variables are similar to moderator variables, but they are not included in an original study either because the researcher has not considered the possibility of their effect or because they cannot be identified in a precise way.
  • 10. CONTROL VARIABLES • When conducting research, one ideally wants to study simply the effects of the independent variable on a dependent variable.
  • 11. • Variables that might interfere with the findings include the possibility that learners with different levels of proficiency respond differently to different types of feedback.
  • 12. OPERATIONALIZATION Once a variable has been operationalized in a manner such as this, it is possible to use it in measurements. An operational definition allows researchers to operate, or work, with the variables. Operationalizations allow measurement.
  • 13. MEASURING VARIABLES: SCALES OF MEASUREMENT • The three commonly used scales are; 1-Nominal 2-Ordinal 3-Interval.
  • 14. 1-Nominal scales are used for attributes or categories and allow researchers to categorize variables into two or more groups. With nominal scales, different categories can be assigned numerical values. 2-An ordinal scale is one in which ordering is implied. For example, student test scores are often ordered from best to worst or worst to best, with the result that there is a 1st-ranked student, a 2nd-ranked student, a l0th- ranked student, and so forth. 3-An interval scale represents the order of a variable's values, but unlike an ordinal scale it also reflects the interval or distance between points in the ranking.
  • 15. VALIDITY • After spending a great deal of time and effort designing a study, we want to make sure that the results of our study are valid. • That is, we want them to reflect what we believe they reflect and that they are meaningful in the sense that they have significance not only to the population that was tested, but, at least for most experimental research, to a broader, relevant population.
  • 16. There are many types of validity; Content validity Face validity Construct validity criterion-related validity predictive validity.
  • 17. CONTENT VALIDITY Content validity refers to the representativeness of our measurement regarding the phenomenon about which we want information. If our test consists only of sentences which have some of the examples of relative clauses. Thus, our testing instrument is not sensitive to the full range of relative clause types, and we can say that it lacks content validity.
  • 18. FACE VALIDITY Face validity is closely related to the notion of content validity and refers to the familiarity of our instrument and how easy it is to convince others that there is content validity to it. Learners are presented with reasoning tasks to carry out in an experiment and are already familiar with these sorts of tasks because they have carried them out in their classrooms, we can say that the task has face validity for the learners.
  • 19. Construct Validity This is perhaps the most complex of the validity types discussed so far. Construct validity is an essential topic in second language acquisition research precisely because many of the variables investigated are not easily or directly defined. In second language research, variables such as language proficiency, aptitude, exposure to input, and linguistic representations are of interest.
  • 20. However, these constructs are not directly measurable in the way that height, weight, or age are. In research, construct validity refers to the degree to which the research adequately captures the construct of interest. Construct validity can be enhanced when multiple estimates of a construct are used.
  • 21. Criterion- Related Validity Criterion-related validity refers to the extent to which tests used in a research study are comparable to other well-established tests of the construct in question. One could measure the performance of a group of students on the local test and a well- established test there should be a good correlation, one can then say that the local test has been demonstrated to have criterion-related validity.
  • 22. Predictive Validity Predictive validity deals with the use that one might eventually want to make of a particular measure. Considering the earlier example of a local language test, if the test predicts performance on some other dimension (class grades), the test can be said to have predictive validity.
  • 23. Internal Validity Internal validity refers to the extent to which the results of a study are a function of the factor that the researcher intends A researcher must control for (i.e., rule out) all other possible factors that could potentially account for the results. It is important to think through a design carefully to eliminate or at least minimize threats to internal validity.
  • 24. • There are many ways that internal validity can be compromised, some of the most common and important of which include participant characteristics, participant mortality (dropout rate), participant inattention and attitude, participant maturation, data collection (location and collector), and instrumentation and test effects.
  • 26. Participant Mortality Participant Inattention and Attitude Participant Maturation
  • 27. Data Collection: Location and Collector Not all research studies will be affected by the location of data collection, but some might. Two groups given the same test might influence the results if one group is in a noisy or uncomfortable setting and the other is not.
  • 28. Another factor in some types of research relates to the person doing the data collection. one could imagine different results depending on whether or not the interviewer is a member of the native culture or speaks the native language.
  • 29. Instrumentation and Test Effects • In this section we discuss three factors that may affect internal validity: equivalence between pre- and posttests, giving the goal of the study away, and test instructions and questions.
  • 30. Equivalence Between Pre- and Posttests. Giving the Goal of the Study Away. Instructions/Questions.
  • 31. External Validity • With external validity, we are concerned with the generalizability of our findings, or in other words, the extent to which the findings of the study are relevant not only to the research population, but also to the wider population of language learners. • It is important to remember that a prerequisite of external validity is internal validity
  • 32. Sampling Random Sampling. Random sampling refers to the selection of participants from the general population that the sample will represent. There are two common types of random sampling: simple random (e.g., putting all names in a hat and drawing from that pool) and stratified random sampling (e.g., random sampling based on categories).
  • 33. Simple random sampling is generally believed to be the best way to obtain a sample that is representative of the population, especially as the sample size gets larger. The key to simple random sampling is ensuring that each and every member of a population has an equal and independent chance of being selected for the research
  • 34. STRATIFIED RANDOM SAPMLING Stratified random sampling provides precision in terms of the representativeness of the sample and allows preselected characteristics to be used as variables. In stratified random sampling, the proportions of the subgroups in the population are first determined, and then participants are randomly selected from within each stratum according to the established proportions.
  • 35. Nonrandom Sampling Nonrandom sampling methods are also common in second language research. Common nonrandom methods include systematic, convenience, and purposive sampling.
  • 36. Systematic sampling is the choice of every nth individual in a population list (where the list should not be ordered systematically) Convenience sampling is the selection of individuals who happen to be available for study. In a purposive sample, researchers knowingly select individuals based on their knowledge of the population and in order to elicit data in which they are interested.
  • 37. Representativeness and Generalizability • If researchers want the results of a particular study to be generalizable, it is incumbent upon them to make an argument about the representativeness of the sample.
  • 38. Fraenkel and Wallen (2003) provided the following minimum sample numbers as a guideline: 100 for descriptive studies 50 for correlational studies 15 to 30 per group in experimental studies depending on how tightly controlled they are.
  • 39. • In second language studies, small groups are sometimes appropriate as long as the techniques for analysis take the numbers into account.
  • 40. Collecting Biodata Information • When reporting research, it is important to include sufficient information to allow the reader to determine the extent to which the results of your study are indeed generalizable to a new context. For this reason, the collection of biodata information is an integral part of one's database
  • 41. • It is recommended that the researcher include enough information for the study to be replicable (American Psychological Association, 2001) and for our purposes in this chapter enough information for readers to determine generalizability. • The first is the privacy and anonymity of the participants; the second is the need to report sufficient data about the participants to allow future researchers to both evaluate and replicate the study.
  • 42. • The Publication Manual of the American Psychological Association also suggested that in reporting information about participants, selection and assignment to treatment groups also be included.
  • 43. RELIABILITY •Reliability in its simplest definition refers to consistency, often meaning instrument consistency.
  • 44. Rater Reliability The main defining characteristic of rater reliability is that scores by two or more raters or between one rater at Time X and that same rater at Time Y are consistent. In many instances, test scores are objective and there is little judgment involved. In many instances, test scores are objective and there is little judgment involved.
  • 45. We want to make sure that our definition of LREs (or whatever construct we are dealing with) is sufficiently specific to allow any researcher to identify them as such. Interrater reliability begins with a well-defined construct. It is a measure of whether two or more raters judge the same set of data in the same way.
  • 46. Instrument Reliability Not only do we have to make sure that our raters are judging what they believe they are judging in a consistent manner, we also need to ensure that our instrument is reliable. In this section, we consider three types of reliability testing: test-retest, equivalence of forms of a test (e.g., pretest and posttest), and internal consistency.
  • 47. Test-Retest In a test-retest method of determining reliability, the same test is given to the same group of individuals at two points in time. One must carefully determine the appropriate time interval between test administrations. In order to arrive at a score by which reliability can be established, one determines the correlation coefficient5 between the two test administrations.
  • 48. Equivalence of Forms There are times when it is necessary to determine the equivalence of two tests, as, for example, in a pretest and a posttest. In this method of determining reliability, two versions of a test are administered to the same individuals and a correlation coefficient is calculated.
  • 49. Internal Consistency It is not always possible or feasible to administer tests twice to the same group of individuals (whether the same test or two different versions). Nonetheless, when that is the case, there are statistical methods to determine reliability; split-half, Kuder-Richardson 20 and 21, and Cronbach's a are common ones. We provide a brief description of each.
  • 50. Split-half procedure is determined by obtaining a correlation coefficient by comparing the performance on half of a test with performance on the other half. A statistical adjustment (Spearman-Brown prophecy formula) is generally made to determine the reliability of the test as a whole. If the correlation coefficient is high, it suggests that there is internal consistency to the test.
  • 51. Kuder-Richardson 20 and 21 are two approaches that are also used. Although Kuder-Richardson 21 requires equal difficulty of the test items, Kuder-Richardson 20 does not. Both are calculated using information consisting of the number of items, the mean, and the standard deviation. These are best used with large numbers of items.
  • 52. Cronbach's a is similar to the Kuder- Richardson 20, but is used when the number of possible answers is more than two. Unlike Kuder-Richardson, Cronbach's a can be applied to ordinal data.