teaching material

Teaching material
PHYSICS EDUCATIONAL ASSESMENT
BY:
KADEK AYU ASTITI, S. PD., M. PD.
NIP. 20140928 201404 2 002
Support by:
Dana PGMIPAU Tahun 2014
PHYSICS EDUCATION PROGRAM
MATHEMATIC AND SCIENCE DEPARTMENT
FACULTY OF TEACHER TRAINING AND EDUCATION
NUSA CENDANA UNIVERSITY
2014

PREFACE
A notable concern of many teachers is that they frequently have the task of
constructing assessment to reflect on learning but have relatively little training or information
to rely on in this task. Assessment so important in teaching and learning. Assessment is very
important for our students, because it shows them where they are falling short. That it why
teachers should always discuss exams with students afterwards, to show them what the right
answers were, and where they made mistakes. For the same reason, students must be given
their marks, and their exam scripts, as soon as possible. Assessment for Learning focuses on
the opportunities to develop students' ability to evaluate themselves, to make judgements
about their own performance and improve upon it. It makes use of authentic assessment
methods and offers lots of opportunities for students to develop their skills through formative
assessment using summative assessment sparingly. To do an effectively assessment, so
teacher must be understand the type of assessment, type of scale assessment, method of
construct the test, validity and reliability tests. Each aspect is discussed in this sourcesbook.
To help the teacher do assessing, so part one contains information the meaning and the
type of assessment. Concerning general test construction and introduces the six levels of
intellectual understanding: knowledge, comprehension, application, analysis, synthesis, and
evaluation. These levels of understanding assist in categorizing test questions, with
knowledge as the lowest level. Part Two of the information sourcebook is devoted to actual
test question construction, test of validity and reliability. Five test item types are discussed:
multiple choice, true-false, matching, completion, and essay. Information covers the
appropriate use of each item type, advantages and disadvantages of each item type, and
characteristics of well written items. Suggestions for addressing higher order thinking skills
for each item type are also presented. This sourcebook was developed to accomplish three
outcomes: 1) Teachers will know the meaning and follow appropriate principles for
developing and using assessment methods in their teaching, avoiding common pitfalls in
student assessment, 2) Teachers will be able to identify and accommodate the limitations of
different informal and formal assessment methods. 3) Teachers will gain an awareness that
certain assessment approaches can be incompatible with certain instructional goals.
Kadek Ayu Astiti, S. Pd., M. Pd.

Contens
Preface i
CHAPTER I TYPE OF ASSESSMENT 1
1.1. Difference measurement, assessment and evaluation
1.2 General Type of Assessment
1.3 Norm Referenced Assessment and Criterion Referenced
Assessment
CHAPTER II EVALUATION OF LEARNING OBJECTS
2.1 Cognitive Learning Outcomes
2.2 Affective Learning Outcomes
2.3 Psychomotor Learning Outcomes
2.4 The Type Value Scale
CHAPTER III LEARNING ASSESSMENT
3.1 Objective
3.2 Essay
CHAPTER IV NON TEST ASSESSMENT
4.1 Observation
4.2 Interview
4.3 Questionnare
4.4 Portofolios
4.5 Project
CHAPTER V Validity Test
5.1 Content Validity
5.2 Criterion Validity
5.3 Construct Validity
CHAPTER VI RELIABILITY TEST
6.1 External Consistency Reliability

6.2 Internal Consistency Reliability
CHAPTER I
TYPES OF ASSESSMENT
Purpose: After learning this matter, students are expected to:
- Be able to explain the definition of assessment
- Be able to explain the different between measurement, assessment, evaluation
- Mention the types of assessment (summative and formative)
- Understand the concept of criterion referenced test and norm referenced framework
1.1. Difference measurement, assessment and evaluation
There is a lot of confusion over these three terms as well as other terms associated
with measurenment, assessment, and evaluation. The following is an understanding of each of
these terms:
Measurement, beyond its general definition, refers to the set of procedures and the principles
for how to use the procedures in educational tests and assessments. Some of the derived
scores, standard scores, etc. A measurement takes place when a “test” is given and a “score”
is obtained. If the test collects quantitative data, the score is a number. If the test collects
qualitative data, the score may be a phrase or word such as “excellent.”
Assessment is a process by which information is obtained relative to some known objective
or goal. As noted in my definition of test, an assessment may include a test, but also includes
methods such as observations, interviews, behavior monitoring, etc.
Evaluation: focuses on grades and may reflect classroom components other than course
content and mastery level. Evaluation are procedures used to determine whether the subject
(i.e. student) meets a preset criteria, such as qualifying for special education services. This
uses assessment (remember that an assessment may be a test) to make a determination of
qualification in accordance with a predetermined criteria.

For the purpose of schematic representation, the three concepts of evaluation,
measurement and testing have traditionally been demonstrated in three concentric circles of
varying sizes. This is the relationship among these concepts.
Evaluation
Assessment
Measurement
Figure 1.1 relationship measurement, assessment and evaluation
Assessment plays a major role in how students learn, their motivation to learn, and how
teachers teach. Assessment is used for various purposes.
• Assessment for learning: where assessment helps teachers gain insight into what
students understand in order to plan and guide instruction, and provide helpful
feedback to students.
• Assessment as learning: where students develop an awareness of how they learn and
use that awareness to adjust and advance their learning, taking an increased
responsibility for their learning.
• Assessment of learning: where assessment informs students, teachers and parents, as
well as the broader educational community, of achievement at a certain point in time
In order to celebrate success, plan interventions and support continued progress.
Assessment must be planned with its purpose in mind. Assessment for, as and of learning all
have a role to play in supporting and improving student learning, and must be appropriately
balanced. The most important part of assessment is the interpretation and use of the
information that is gleaned for its intended purpose. Assessment is embedded in the learning
process. It is tightly interconnected with curriculum and instruction. As teachers and students

work towards the achievement of curriculum outcomes, assessment plays a constant role in
informing instruction, guiding the student’s next steps, and checking progress and
achievement. Teachers use many different processes and strategies for classroom assessment,
and adapt them to suit the assessment purpose and needs of individual student.
Table 1.1 Classroom assessment: from … to …
No From To
1 Classroom tests disconnected from the
focus of instruction
Classroom tests refecting the written and
taught curriculum
2 Assessment using only selected
respons formats
Assessment method selected intentionally to
reflect specific kinds of learning target
3 Mystery assessment, where students
don’t know in advances what they are
accountable for learning
Transparency in assessments, where students
know in advance what they will be held
accountable for learning
4 All assessment and assignments,
including practice, “count” toward the
grade
Some assessment an assignment “count”
toward the grade, others are for practice or
other formative use
5 Students as passive participant in the
assessment process
Students as active users of assessments as
learning experiences
6 Students not finding out until the
graded event what they are good at
and what they need to work on
Students being able to identify theirs
strengths and areas for futher study during
learning
1.2. General Type of Assessment
1.2.1. Summative assessment
Summative assessment are cumulative evaluation used to measure student growth
after instruction and are generally given at the end of a course in order to determine wheter
long term learning goals have been met. Summative assessment is assessments that provide
evidence of student achievement for the purpose of making a judgment about student
competence or program effectiveness. Typically the summative evaluation concentrates on
learner outcome rather than only the program of instruction. It is means to determine a
student’s mastery and understanding of information, skills, concept and process. Summative
assessment occur at the end of a formal learning experience. Either a class or a program and
may include a variety of activities example test, demonstration, portofolios, internship,
clinical, and capstone project. Summative assement is a high stakes type of assessment for the
purpose of the making final judgment about student achivment and instructional effectiveness.

By the time summative assessment occur student haved typically exit the learning mode.
Teachers/schools can use these assessments to identify strengths and weaknesses of
curriculum and instruction, with improvements affecting the next year's/term's students.
Summative assessment are given periodically to determine at a particular point in time what
students know and do not know. Many associate summative assessments only with
standardized tests such as state assessments, but they are also used at and are an important
part of district and classroom programs. Summative assessment at the district and classroom
level is an accountability measure that is generally used as part of the grading process. The list
is long, but here are some examples of summative assessments:
a) State assessments
b) District benchmark or interim assessments
c) End-of-unit or chapter tests
d) End-of-term or semester exams
e) Scores that are used for accountability of schools (AYP) and students (report card grades).
The key is to think of summative assessment as a means to gauge, at a particular point
in time, student learning relative to content standards. Although the information gleaned from
this type of assessment is important, it can only help in evaluating certain aspects of the
learning process. Because they are spread out and occur after instruction every few weeks,
months, or once a year, summative assessments are tools to help evaluate the effectiveness of
programs, school improvement goals, alignment of curriculum, or student placement in
specific programs. Summative assessments happen too far down the learning path to provide
information at the classroom level and to make instructional adjustments and interventions
during the learning process. It takes formative assessment to accomplish this. The goal of
summative assessment is to evaluate student learning at the end of an instructional unit by
comparing it against some standard or benchmark. Information from summative assessments
can be used formatively when students or faculty use it to guide their efforts and activities in
subsequent courses.
1.2.2. Formative assessment
Formative Assessment is part of the instructional process. Formative assessment is an
integral part of teaching and learning. Formative assessment ongoing assessments, reviews,
and observations in a classroom. Teachers use formative assessment to improve instructional
methods and student feedback throughout the teaching and learning process. For example, if a
teacher observes that some students do not grasp a concept, she or he can design a review
activity or use a different instructional strategy. Likewise, students can monitor their progress

with periodic quizzes and performance tasks. The results of formative assessments are used to
modify and validate instruction. Formative assessment occurs in the short term, as learners are
in the process of making meaning of new content and of integrating it into what they already
know. When in corporated into classroom practice, it providesthe information needed to
adjust teaching and learning while they are happening. In this sense formative assessment
informs both teachers and students about student understanding at a point when timely
adjustment can be made. These adjustment help to ensure student achieve, targeted standards
based learning goal within a set time frame. Although formative assessment strategies appear
in a variety of formats, there are some distinct ways to distinguish them from summative
assessments. Formative assessment helps teachers determine next steps during the learning
process as the instruction approaches the summative assessment of student learning.
Some of the instructional strategies that can be used formatively include the following:
1. Criteria and goal setting with students engages them in instruction and the learning
process by creating clear expectations. In order to be successful, students need to
understand and know the learning target/goal and the criteria for reaching it. Establishing
and defining quality work together, asking students to participate in establishing norm
behaviors for classroom culture, and determining what should be included in criteria for
success are all examples of this strategy. Using student work, classroom tests, or
exemplars of what is expected helps students understand where they are, where they need
to be, and an effective process for getting there.
2. Observations go beyond walking around the room to see if students are on task or
need clarification. Observations assist teachers in gathering evidence of student learning
to inform instructional planning. This evidence can be recorded and used as feedback for
students about their learning or as anecdotal data shared with them during conferences.
3. Questioning strategies should be embedded in lesson/unit planning. Asking better
questions allows an opportunity for deeper thinking and provides teachers with
significant insight into the degree and depth of understanding. Questions of this nature
engage students in classroom dialogue that both uncovers and expands learning. An “exit
slip” at the end of a class period to determine students’ understanding of the day’s lesson
or quick checks during instruction such as “thumbs up/down” or “red/green” (stop/go)
cards are also examples of questioning strategies that elicit immediate information about
student learning. Helping students ask better questions is another aspect of this formative
assessment strategy.
4. Self and peer assessment helps to create a learning community within a classroom.
Students who can refect while engaged in metacognitive thinking are involved in their

learning. When students have been involved in criteria and goal setting, self-evaluation is
a logical step in the learning process. With peer evaluation, students see each other as
resources for understanding and checking for quality work against previously established
criteria.
5. Student record keeping helps students better understand their own learning as
evidenced by their classroom work. This process of students keeping ongoing records
of their work not only engages students, it also helps them, beyond a “grade,” to see
where they started and the progress they are making toward the learning goal. All of these
strategies are integral to the formative assessment process, and they have been suggested
by models of effective middle school instruction.
6. Balancing Assessment. As teachers gather information/data about student learning,
several categories may be included. In order to better understand student learning,
teachers need to consider information about the products (paper or otherwise) students
create and tests they take, observational notes, and reflections on the communication that
occurs between teacher and student or among students. When a comprehensive
assessment program at the classroom level balances formative and summative student
learning/achievement information, a clear picture emerges of where a student is relative
to learning targets and standards. Students should be able to articulate this shared
information about their own learning. When this happens, student-led conferences, a
formative assessment strategy, are valid. The more we know about individual students as
they engage in the learning process, the better we can adjust instruction to ensure that all
students continue to achieve by moving forward in their learning.
The goal of formative assessment is to monitor student learning to provide ongoing feedback
that can be used by instructors to improve their teaching and by students to improve their
learning. More specifically, formative assessments:
• help students identify their strengths and weaknesses and target areas that need work
• help faculty recognize where students are struggling and address problems
immediately
Formative assessments are generally low stakes, which means that they have low or no point
value. Examples of formative assessments include asking students to:
• draw a concept map in class to represent their understanding of a topic
• submit one or two sentences identifying the main point of a lecture
• turn in a research proposal for early feedback

1.3. Norm Referenced Assessment and Criterion Referenced Assessment
When we look at the types of assessment instruments, we can generally classify them
into two main groups: Criterion referenced assessments and norm-referenced assessments.
1.3.1. Norm Referenced assessment
Linn and Gronlund (2000) define norm referenced assessments in the following a test
or other type of assessment designed to provide a measure of performance that is interpretable
in terms of an individual's relative standing in some known group. Norm referenced tests
allow us to compare a student's skills to others in his age group. Norm-referenced tests are
developed by creating the test items and then administering the test to a group of students that
will be used as the basis of comparison.
The essential characteristic of norm referencing is that students are awarded their
grades on the basis of their ranking within a particular cohort. Norm-referencing involves
fitting a ranked list of students’ ‘raw scores’ to a pre-determined distribution for awarding
grades. Usually, grades are spread to fit a ‘bell curve’ (a ‘normal distribution’ in statistical
terminology), either by qualitative, informal rough-reckoning or by statistical techniques of
varying complexity. For large student cohorts (such as in senior secondary education),
statistical moderation processes are used to adjust or standardise student scores to fit a normal
distribution. Norm referenced is standardized test compare students performance to that of a
norming or sample group who are in the same grade or are the same age. Student performance
is communicated in presentile ranks, grade aquivalent score, normal curve equivalents, scaled
scores, or stanine scores.
1.3.2. Criterion Referenced Assessment
Criterion referenced is a students performance is easured against a standard. One form
of criterion referenced assessment is the benchmark, a description of a key task that students
are expected be perform. In contrast, criterion referencing assessment as the name implies,
involves determining a student’s grade by comparing his or her achievements with clearly
stated criteria for learning outcomes and clearly stated standards for particular levels of
performance. Linn and Gronlund (2000) define criterion referenced assessments in the
following a test or other type of assessment designed to provide a measure of performance
that is interpretable in terms of a clearly defined and delimited domain of learning tasks.

Unlike norm-referencing, there is no pre-determined grade distribution to be generated and a
student’s grades is in no way influenced by the performance of others. Theoretically, all
students within a particular cohort could receive very high (or very low) grades depending
solely on the levels of individuals’ performances against the established criteria and standards.
The goal of criterion referencing is to report student achievement against objective reference
points that are independent of the cohort being assessed. Criterion referencing can lead to
simple pass fail grading schema, such as in determining fitness to practice in professional
fields. Criterion referencing can also lead to reporting student achievement or progress on a
series of key criteria rather than as a single grade or percentage. Criterion referencing is worth
aspiring towards. Criterion referencing requires giving thought to expected learning
outcomes: it is transparent for students, and the grades derived should be defensible in
reasonably objective terms students should be able to trace their grades to the specifics of
their performance on set tasks. Criterion referencing lays an important framework for student
engagement with the learning process and its outcomes.
The distinction between criterion and norm referenced assessments is criterion
referencing compares one to a standard, norm referencing compares one to others. The
following is a difference between norm referenced test and criterion referenced test adapted
from Popham, (1975).
Table 1.2. Difference Norm Referenced Test and Criterion Referenced Test
Dimension
Criterion Referenced
Tests
Norm Referenced
Tests
Purpose
To determine whether each student
has achieved specific skills or
concepts.
To find out how much students
know before instruction begins and
after it has finished.
To rank each student with respect to
the achievement of others in broad
areas of knowledge.
To discriminate between high and low
achievers.
Content
Measures specific skills which
make up a designated curriculum.
These skills are identified by
teachers and curriculum experts.
Each skill is expressed as an
instructional objective.
Measures broad skill areas sampled
from a variety of textbooks, syllabi,
and the judgments of curriculum
experts.
Item
Characteristics
Each skill is tested by at least four
items in order to obtain an adequate
sample of student
performance and to minimize the
effect of guessing.
Each skill is usually tested by less than
four items.
Items vary in difficulty.

The items which test any given
skill are parallel in difficulty.
Items are selected that discriminate
between high and low achievers.
Score
Interpretation
Each individual is compared with a
preset standard for acceptable
achievement. The performance of
other examinees is irrelevant.
A student's score is usually
expressed as a percentage.
Student achievement is reported for
individual skills.
Each individual is compared with other
examinees and assigned a score--
usually expressed as a percentile, a
grade equivalent
score, or a stanine.
Student achievement is reported for
broad skill areas, although some norm-referenced
tests do report student
achievement for individual skills.
Which of these methods is preferable? Mostly, students’ grades in universities are
decided on a mix of both methods, even though there may not be an explicit policy to do so.
In fact, the two methods are somewhat interdependent, more so than the brief explanations
above might suggest. Logically, norm-referencing must rely on some initial criterion-referencing,
since students’ ‘raw’ scores must presumably be determined in the first instance
by assessors who have some objective criteria in mind. Criterion-referencing, on the other
hand, appears more educationally defensible. But criterion-referencing may be very difficult,
if not impossible, to implement in a pure form in many disciplines. It is not always possible to
be entirely objective and to comprehensively articulate criteria for learning outcomes: some
subjectivity in setting and interpreting levels of achievement is inevitable in higher education.
This being the case, sometimes the best we can hope for is to compare individuals’
achievements relative to their peers.
Norm-referencing, on its own and if strictly and narrowly implemented is undoubtedly
unfair. With norm-referencing, a student’s grade depends to some extent at least not only on
his or her level of achievement, but also on the achievement of other students. This might lead
to obvious inequities if applied without thought to any other considerations. For example, a
student who fails in one year may well have passed in other years! The potential for
unfairness of this kind is most likely in smaller student cohorts, where norm-referencing may
force a spread of grades and exaggerate differences in achievement. Alternatively, norm-referencing
might artificially compress the range of difference that actually exists.
Recognising, however, that some degree of subjectivity is inevitable in higher
education, it is also worthwhile to monitor grade distributions – in other words, to use a
modest process of norm-referencing to watch the outcomes of a predominantly criterion-referenced
grading model. In doing so, if it is believed too many students are receiving low
grades, or too many students are receiving high grades, or the distribution is in some way
oddly spread, then this might suggest something is amiss and the assessment process needs

looking at. There may be, for instance, a problem with the overall degree of difficulty of the
assessment tasks, for example: not enough challenging examination questions, or too few, or
assignment tasks that fail to discriminate between students with differing levels of knowledge
and skills. There might also be inconsistencies in the way different assessors are judging
student work. Best practice in grading in higher education involves striking a balance between
criterion referencing and norm-referencing. This balance should be strongly oriented towards
criterion referencing as the primary and dominant principle.
Reference:
Bastanfar, A. 2009. Alternative in Assessment. Article.
http://www3.telus.net/linguisticsissues/alternatives.
Garrison, C. & Ehringhaus, M. 2010. Formative and Summative Assessment in The
Classroom. www.measuredprogress.
Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th ed.).
Upper Saddle River, NJ: Prentice Hall.
Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing 18
(4) 351–372.
Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall,
Inc.

CHAPTER II
EVALUATION OF LEARNING OBJECTS
Purpose: After learning this matter, student are expected to:
- Understanding what is measured in cognitive aspects
- Understanding what is measured in afectif aspects
- Understanding what is measured in psikomotor aspects
- Be able to explain the kinds of scale assessment
2. 1. Cognitive Learning Outcomes
One of the objects of evaluation result is the cognitive aspects of learning. The test
questions will focus on appropriate intellectual activity ranging from simple recall to problem
solving, crithical thinking and reasoning. Cognitive complexity refers to the various levels of
learning that can be tested. A good test reflects the goals of the instruction. If the instructor is
mainly concerned with students memorizing facts, the test should ask for simple recall of
material. If the instructor is trying to develop analytic skills, a test that asks for recall is
inappropriate and will cause students to conclude that memorization is the instructor's true
goal.
In 1956, after extensive research on educational goals, the group published its findings
in a book edited by Dr. Benjamin S. Bloom, a Harvard professor. Bloom’s Taxonomy of
Educational Objectives lists six levels of intellectual understanding:
• Knowledge • analysis
• Comprehension • syntesis
• application • evaluation
Table 2.1 Cognitive complexity adapated from Clay (2001)

Steps Explanation Example
Knowledge Recognizing and recalling
information, including dates,
events, persons, places; terms,
definitions; facts, principles,
theories; methods and procedures
Who invented the…?
What is meant by…?
Where is the…?
Comprehension Understanding the meaning of
information, including restating (in
own words); translating from one
form to another; or interpreting,
explaining, and summarizing.
Restate in your own words…?
Convert fractions into…?
List three reasons for…?
Application Applying general rules, methods, or
principles to a new situation,
including classifying something as
a specific example of a general
principle or using a formula to
solve a problem.
How is...an example of... ?
How is...related to... ?
Why is...significant?
Analysis Identifying the organization and
patterns within a system by
identifying its component parts and
the relationships among the
components.
What are the parts of... ?
Classify ...according to...
Outline/diagram...
Synthesis Discovering/creating new
connections, generalizations,
patterns, or perspectives;
combining ideas to form a new
whole.
What would you infer from... ?
What ideas can you add to... ?
How would you create a... ?
Evaluation Using evidence and reasoned
argument to judge how well a
proposal would accomplish a
particular purpose; resolving
controversies or differences of
opinion.
Do you agree…?
How would you decide about... ?
What priority would you give... ?
2. 2. Affective Learning Outcomes
Affective learning outcomes are learning outcomes related to their interests, attitudes
and values. Affective learning outcomes developed by karthwohl, et al as outlined in his
book: “Handbook II: The affective Domain”. According Karthwohl (in Mehren and
Lehmann, 1973) affective domain consisit of: receiving, responding, valuting, organization,
and Characteristing.
Table 2.2: Affective domain guide adapted by Clay (2001)

Level If the student must Then use these key words in objectives,
assignments and evaluations
Receiving …receive information
about or give attention
to this new attitude,
value or belief.?
• be alert to
• be aware of
• be sensitive to
• experience
• listen to
• look at
• perceive existence
• Receive information on
• take notes on
• take notice of
• willingly attends
Responding …participate in, or
react to this new
attitude, value or belief
in a positive manner.
• allow other to
• answer questions
on
• contribute to
• cooperate with
dialog on
• discuss openly
• enjoy doing
• participate in
• reply to
• respect those who
Valuing …show some definite
involvement in or
commitment to this new
attitude, value or belief
• accept as right
• accept as true
• affirm belief/trust
in
• associate himself
with
• assume as true
• consider valuable
• decide based on
• indicate agreement
• influence others
• justify based on
• seek out more detail
Organizing …integrate this new
attitude, value or belief,
with the existing
organization of
attitudes, values and
beliefs, so that it has a
position of priority and
advocacy.
• Advocate
• integrate into life
• judge based on
• place in value
system
• prioritize based on
• persuade others
• systematize
Characteristi
ng
…fully internalize this
new attitude, value or
belief so that it
consistently
characterizes thought
and action.
• act based on
• consistently carry
out
• consistently
practice
• fully internalize
• know by others as
• characterized by
• sacrifice for
• view life based on
2. 3. Psychomotor Learning Outcomes
Psychomotor learning outcomes are learning outcomes related to motor skills and the
ability to act individually. Pscychomotor behaviors are performed actions that are
neuromuscular in nature and demand certain levels of physical dexterity. This assessment is
suitable to assess the achievement of competence demanded of learners perform a specific
task example: experiment in laboratorium. Taxonomy is often used is the taxonomy of the
psychomotor learning outcomes Simpson (Gronlund and Linn, 1990. That taxonomi such as
perception, set, guided response, mechanism, Complex Overt Response, adaptation,
origination.

Tabel 2.3 Psychomotor Domain
Category Description Examples of activity Action verbs
Perception
Awareness, the
ability to use
sensory cues to
guide physical
activity. The ability
to use sensory cues
to guide motor
activity. This
ranges from
sensory
stimulation,
through cue
selection, to
translation.
use and/or selection of senses
to absorb data for guiding
movement
Examples: Detects non-verbal
communication cues.
Estimate where a ball will
land after it is thrown and
then moving to the correct
location to catch the ball.
Adjusts heat of stove to
correct temperature by smell
and taste of food. Adjusts the
height of the forks on a
forklift by comparing where
the forks are in relation to the
pallet.
“By the end of the music
theatre program, students will
be able to relate types of
music to particular dance
steps.”
chooses, describes,
detects, differentiates,
distinguishes, feels,
hears, identifies,
isolates, notices,
recognizes, relates,
selects, separates,
touches,
Set
Readiness, a
learner's readiness
to act. Readiness to
act. It includes
mental, physical,
and emotional sets.
These three sets are
dispositions that
predetermine a
person’s response
to different
situations
(sometimes called
mindsets).
mental, physical or emotional
preparation before experience
or task
Examples: Knows and acts
upon a sequence of steps in a
manufacturing process.
Recognize one’s abilities and
limitations. Shows desire to
learn a new process
(motivation). NOTE: This
subdivision of Psychomotor is
closely related with the
"Responding to phenomena"
subdivision of the Affective
domain.
“By the end of the physical
education program, students
will be able to demonstrate
the proper stance for batting a
ball.”
arranges, begins,
displays, explains, gets
set, moves, prepares,
proceeds, reacts,
shows, states,
volunteers, responds,
starts,
Guided
Response
Attempt. The early
stages in learning a
complex skill that
includes imitation
imitate or follow instruction,
trial and error.
Examples: Performs a
mathematical equation as
assembles, builds,
calibrates, constructs,
copies, dismantles,
displays, dissects,

and trial and error.
Adequacy of
performance is
achieved by
practicing.
demonstrated. Follows
instructions to build a model.
Responds hand-signals of
instructor while learning to
operate a forklift.
“By the end of the physical
will be able to perform a golf
swing as demonstrated by the
instructor.”
fastens, fixes, follows,
grinds, heats, imitates,
manipulates, measures,
mends, mixes, reacts,
reproduces, responds
sketches, traces, tries.
Mechanism
basic proficiency,
the ability to
perform a complex
motor skill.
This is the
intermediate stage
in learning a
complex skill.
Learned responses
have become
habitual and the
movements can be
performed with
some confidence
and proficiency.
competently respond to
stimulus for action
Examples: Use a personal
computer. Repair a leaking
faucet. Drive a car.
“By the end of the biology
program, students will be able
to assemble laboratory
equipment appropriate for
experiments.”
assembles, builds,
calibrates, completes,
constructs, dismantles,
displays, fastens, fixes,
grinds, heats, makes,
mends, mixes,
organizes, performs,
shapes, sketches.
Complex
Overt
Response
expert proficiency,
the intermediate
stage of learning a
complex skill.
The skillful
performance of
motor acts that
involve complex
movement patterns.
Proficiency is
indicated by a
quick, accurate,
and highly
coordinated
performance,
requiring a
minimum of
energy. This
category includes
performing without
hesitation, and
automatic
performance. For
example, players
Execute a complex process
with expertise
Examples: Maneuvers a car
into a tight parallel parking
spot. Operates a computer
quickly and accurately.
Displays competence while
playing the piano.
“By the end of the industrial
will be able to demonstrate
proper use of woodworking
tools to high school students.”
assembles, builds,
calibrates, constructs,
coordinates,
demonstrates,
dismantles, displays,
dissects, fastens, fixes,
grinds, heats,
mends, mixes,
organizes, sketches.
NOTE: The key words
are the same as
Mechanism, but will
have adverbs or
adjectives that indicate
that the performance is
quicker, better, more
accurate, etc.

are often utter
sounds of
satisfaction or
expletives as soon
as they hit a tennis
ball or throw a
football, because
they can tell by the
feel of the act what
the result will
produce.
Adaptation
adaptable
proficiency, a
learner's ability to
modify motor skills
to fit a new
situation.
Skills are well
developed and the
individual can
modify movement
patterns to fit
special
requirements.
alter response to reliably meet
varying challenges
Examples: Responds
effectively to unexpected
experiences. Modifies
instruction to meet the needs
of the learners. Perform a task
with a machine that it was not
originally intended to do
(machine is not damaged and
there is no danger in
performing the new task).
“By the end of the industrial
will be able to adapt their
lessons on woodworking
skills for disabled students.”
adapts, adjusts, alters,
changes, integrates,
rearranges, reorganizes,
revises, solves, varies.
Origination
creative
proficiency, a
learner's ability to
create new
movement patterns.
Creating new
movement patterns
to fit a particular
situation or specific
problem. Learning
outcomes
emphasize
creativity based
upon highly
developed skills.
develop and execute new
integrated responses and
activities
Examples: Constructs a new
theory. Develops a new and
comprehensive training
programming. Creates a new
gymnastic routine.
arranges, builds,
combines, composes,
constructs, creates,
designs, formulates,
initiate, makes,
modifies, originates, re-designs,
trouble-shoots.
2. 4. The Type of Value Scale

Scales of measurement refer to ways in which variables/numbers are defined and
categorized. Each scale of measurement has certain properties which in turn determines the
appropriateness for use of certain statistical analyses. There are four measurement scales (or
types of data) such as nominal, ordinal, interval and ratio.
2.4.1. Nominal value scale
Nominal value scale is a scale that used to identifying object, individual, or group. In
the quisioner that gave yes (1) answer or no (0) is the sample nominal value scale. The least
like real number. Nominal basically refers to category discrete data such as name of your
school, type of car you drive, classifying the gender, religion, menu items selected, etc.
What is your gender?
M – Male
F - Female
Which recreational activities
do you participat in?
1 – Hiking
2 – Fishing
3 – Boating
4 – swimming
5 - picniking
a sub-type of nominal scale with only two categories (e.g. male/female) is called
“dichotomous.” Nominal data can be clearly described in pie charts because they include
clear categories that sup-up to 100%.
2.4.2. Ordinal values scale
Ordinal value scale is a scale that have a rank form. Ordinal is Scale for ordering
observations from low to high with any ties attributed to lack of measurement sensitivity e.g.
score from a questionnaire. For the example the first rank, second, and soon. In the questioner
that have a likerts scale, use the ordinal value scale such as a disagree statement (1), doubt
statement (2), and an agree statement (3). Ordinal refer to quantities that have a natural
ordering. The ranking of favorit sports, the order of people placein a line, the order of runners
finishing a race or more often the choice on a rating scale from 1 to 5. For the example: class
ranks, social class categories, etc.
Example:
How statisfied are you
with our service?
1. very unsatisfied
2. unsatisfied
3. neutral
4. unsatisfied
5. very unsatisfied
How do you feel today?
1. very unhappy
2. unhappy
3. ok
4. happy
5. very happy
What is your hair colour?
1 – Black
2 – brown
3 – blonde
4 – gray
5 - other

2.4.3. Interval values scale
Internal value scale is a same scale with nominal and and ordinal values scale, but it
has a remain characteristics and be able to notate into the mathematics function. Scale with a
fixed and defined interval. For the example how much a woman go to market (one, twice, etc)
or final test score. Interval data is like ordinal except we can say the intervals between each
value are equally split. The most common example is the temperature in degrees Fahrenheit.
The difference between 29 and 30 degrees is the same magnitude as the difference between 78
and 79.
2.4.4. Ratio value scale
Ratio value scale is a real value scale, have a same distance and be able to notate into
the mathematics function. Ratio scales are the easiest to understand Because they are numbers
as we usually think of them. The distance between adjacent numbers are equal on a ratio scale
and the score of zero on the scale ratio means that there is none of whatever is being
Measured. Most ratio scales are counts of things. Ratio data is interval data with a natural zero
point. The example, weight the distance of street, time to complete a task, size of an object,
etc.
Reference :
Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question).
Kansas State Department of Education
Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom.
Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co
Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall.
Inc.
Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.

CHAPTER III
TEST ASSESSMENT
- be able to explain the test assessment
- be able to describe kinds of test assessment
- understanding advantages and disadvantages using the test
Decision making
Non testing Non
tests
measurement
quantitative
assessment by
description
Figure 3.1 alternative assessment, decision making in educational
setting
As Figure 3.1 shows, tests constitute only a small set of options, among a wide range
of other options, for a language teacher to make decisions about students. The judgment
emanating from a test is not necessarily more valid or reliable from the one deriving from
qualitative procedures since both should meet reliability or validity criteria to be considered as
informed decisions. The area circumscribed within quantitative decision-making is relatively

small and represents a specific choice made by the teacher at a particular time in the course
while the vast area outside which covers all non-measurement qualitative assessment
procedures represents the wider range of procedures and their general nature. This means that
the qualitative approaches which result in descriptions of individuals, as contrasted to
quantitative approaches which result in numbers, can go hand in hand with the teaching and
learning experiences in the class and they can reveal more subtle shades of students’
proficiency.
Test is method of measuring a persons ability, knowledge or performance to complete
certain tasks or demonstrate mastery of a skill or knowledge of content. Test is a systematic
procedure for observing persons and describing them with either a numerical scale or a
category system. Thus test may give either a qualitative or quantitative information. Two type
of test are objective test and essay test. Essay tests are appropriate when:
• The group to be tested is small and the test is not to be reused.
• You wish to encourage and reward the development of student skill in
writing.
• You are more interested in exploring the student’s attitudes than in
measuring his/her achievement.
Objective tests are appropriate when:
• The group to be tested is large and the test may be reused.
• Highly reliable scores must be obtained as efficiently as possible.
• Impartiality of evaluation, fairness, and freedom from possible test
scoring influences are essential.
Either essay or objective tests can be used to 1) measure almost any important
educational achievement, 2) a written test can measure, 3) test understanding and ability to
apply principles, 4) test ability to think critically, 5) test ability to solve problems.
3.1. Objective
Objective tests measure both your ability to remember facts and figures and your
understanding of course materials. These tests are often designed to make you think
independently, so don't count on recognizing the right answer. Instead, prepare yourself for
high level critical reasoning and making fine discriminations to determine the best answer.
Taking an objective examination is somewhat different from taking an essay examination.
The objective examination may be composed of true false, multiple choice, or matching
responses. Also included occasionally is a fill in section. There are certain things that you
must remember to do as you take this kind of test. First, roughly decide how to divide your

time. Quickly glance over the pages to see how many kinds of questions are being used and
how many there are of each kind. Secondly, carefully read the instructions and make sure that
you understand them before you begin to work. Indicate your answers exactly as specified in
the instructions. If your instructor has not indicated whether there is a penalty for guessing,
ask him or her about it; then, if there is a penalty, do not guess.
3.1.1. Multiple choice
Multiple choice is a test that has items formatted as multiple choice question, and the
candidat must choose which answer or group of answers is correct. The multiple choice
question consists of two parts: 1) the stem the statement or question, which identifies the
question or problem and 2) the choices also known as the distracters. Usually, students are
asked to select the one alternative that best completes a statement or answers a question.
Multiple choice items can also provide an excellent basis for post test discussion, especially if
the discussion addresses why the incorrect responses were wrong as well as why the correct
responses were right. Unfortunately, multiple choice items are difficult and time consuming
to construct well. They may also appear too discriminating (picky) to students, especially
when the alternatives are well constructed and are open to misinterpretation by students who
read more into questions than is there. Multiple choice tests can be used to test the ability to:
1. Recall memorized information
2. Apply theory to routine cases
3. Apply theory to novel situations
4. Use judgment in analyzing and evaluating
Example of multiple choice:
A three years old child can usually be expected to:
a. Cry when separated from his or her mother
b. Have imaginary friends
c. Play with other children of the same age
d. Constantly argue with order siblings
3.1.2. True/false questions
True/false question present candidates with a binary choice a statement is either true or
false. This method presents problems as depending on the number of questions. True/false
questions also a popular question type, the true false question has only two options. True false
questions usually state the relation of two things to one another. Because the instructor is
interested in knowing whether you know when and under what circumstances something is or

is not true, s/he usually includes some qualifiers in the statement. The qualifiers must be
carefully considered. With the following qualifiers, you are wiser to guess "yes" if you don't
know the answer because you may stand some chance of getting the answer right: most, some,
usually, sometimes, and great. On the other hand, with these next qualifiers, you should guess
"no" unless you are certain that the statement is true: all, no, always, is, never, is not, good,
bad, equal, less.
The following are advantages of true or false test:
• Can test large amounts of content
• Students can answer 3-4 questions per minute
And the disadvantages are:
• They are easy
• It is difficult to discriminate between students that know the material and students who
do not
• Students have a 50-50 chance of getting the right answer by guessing
• Need a large number of items for high reliability
Example of true or false question:
1. Electrons are larger than molecules.
a. True b. false
2. True or false? The study of plants is known as botany.
a. True b. false
3. TTrue or false? Is it recommended to take statements directly from the text to make
good true-false questions?
a. True b. false
3.1.3. Matching Questions Type
Matching question type is an items that provides a define term and requires a test taker
to match identifying characteristic to the correct term. Matching questions give students some
opportunity for guessing. Student must know the information well in that you are presented
with two columns of items for which student must establish relationships. If only one match is
allowed per item then once items become eliminated, a few of the latter ones may be guessed.
Matching questions give stundent some opportunity for guessing. Student must know the
information well in that you are presented with two columns of items for which students must
establish relationships. If only one match is allowed per item then once items become
eliminated, a few of the latter ones may be guessed. A simple matching item consists of two

columns: one column of stems or problems to be answered, and another column of responses
from which the answers are to be chosen. Traditionally, the column of stems is placed on the
left and the column of responses is placed on the right.
Example:
Directions: match the following!
Water A. NaCl
Discovered radium B. H2O
Salt C. Fermi
Ammonia D. NH3
E. Curie
3.1.4. Completions type
Completions type is a filling in the blank item provides a test taker with identifying
characteristic and requires the test taker to recall the correct term. Completion items are
especially useful in assessing mastery of factual information when a specific word or phrase is
important to know. There are two type of completion type such us the easier version provides
a word bank of possible word that will fill in the blank. For some exams all words in the word
bank are exactly once. If a teacher wanted to create a test of medium difficulty, they would
provide a test with a word bank, but some words maybe used more than onces and others not
at all. The hardest variety of such a test is a fill in the blank test in which no word bank is
provided at all. This generally requires a higer level of understanding and memory than a
multiple choice. Advantages:
• Good for who, what, where, when content
• Minimizes guessing
• Encourages more intensive study. Student must know the answer vs.
recognizing the answer.
• Can usually provide an objective measure of student achievement or
ability
Disadvantages:
• Difficult to assess higher levels of learning because the answers to
completion items are usually limited to a few words
• Difficult to construct so that the desired response is clearly indicated
• May overemphasize memorization of facts
• Questions may have more than one correct answer
• Scoring is time consuming

A completion item requires the student to answer a question or to finish an incomplete
statement by filling in a blank with the correct word or phrase.
For example,
A subatomic particle with a negative electric charge is called a(n) ____________.
3.2. Essay
Essay test is a test that requires the student to compose responses, usually lengthy up
to several paragraphs. Essay test measure higer level thinking. A typical essay test usually
consists of a small number of questions to which the student is expected to recall and organize
knowledge in logical, integrated answers. Questions that test higher level processes such as:
analysis, synthesis, evaluation, creativity. The distinctive feature of essay type tets is the
freedom of response. Pupil are free to select, relates and present ideas in their own words.
Items such us shorts answer or essay typically require a test taker to write a response to fulfill
the requirenments of the item. In administrative term, essay items take less time to construct.
As an assessment tool, essay items can test complex learning objectives as well as processes
used to anser the questions. The items can also provide more realistic and generalize task for
test. Finally, these items make it difficult for test takers to guess the correct answers and
require test takers to demonstrate their writing skills as well as correct spelling and grammar.
Uses of essay test:
a. Assess the ability to recall, organize, and integrate ideas.
b. Assess the ability to express one self in writing.
c. Ability to supply information.
d. Assess student understanding of subject matter.
e. Measure the knowledge of factual information.
The main advantages of essay and short answer items are that they permit students to
demonstrate achievement of such higher level objectives as analyzing and critical thinking.
Written items offer students the opportunity to use their own judgment, writing styles, and
vocabularies. They are less time consuming to prepare than any other item type. Research
indicates that students study more efficiently for essay type examinations than for selection
(multiple choice) tests. Students preparing for essay tests focus on broad issues, general
concepts, and interrelationships rather than on specific details. This studying results in
somewhat better student performance regardless of the type of exam they are given. Essay

tests also give the instructor an opportunity to comment on students' progress, the quality of
their thinking, the depth of their understanding, and the difficulties they may be having.
The following are the advantages essay test:
• Students less likely to guess
• Easy to construct
• Stimulates more study
• Allows students to demonstrate ability to organize knowledge, express opinions, show
originality.
Disadvantages:
• Can limit amount of material tested, therefore has decreased validity.
• Subjective, potentially unreliable scoring.
• Time consuming to score.
Types of essay test:
3.2.1. Restricted response
The restricted response question usually limits both the content and the response the
content is usually restricted by the scope of the topic to be discussed limitations on the form
of response are generally indicated in the question another way of restricting responses in
essay tests is to base the questions on specific problems. For this purpose, introductory
material like that used in interpretive exercises can be presented. Such items differ from
objective interpretive exercise only by the fact that essay questions are used instead of
multiple choice or true or false items. Because the restricted response question is more
structured it is most useful for measuring learning outcomes requiring the interpretation and
application of date in a specific area. Example of restricted response: describe two situations
that demonstrate the application of the law of supply and demand, state any five definition of
education!
Advantages of restricted response questions:
• restricted response questions more structured
• measure specific learning outcomes
• provide more for more ease of assessment
• any outcomes measured by an objective interpretive exercise can be measured by
a restricted response questions
3.2.2. Extended response

Extended response question allows student to select information that they think is
pertinent, to organize the answer in accordance with their best judgment and to integrate and
evaluate ideas as they think suitable. No restriction is placed in students as to the points he
will discuss and the type of organization he will use. They do not set limits on the length or
exact content to be discussed.
Teachers in such a way so as to give students the maximum possible freedom to
determine the nature and scope of question and in a way he would give response of course
being related topic and in stipulated time frame these types of questions. The student may be
select the points he thinks are most important, pertinent and relevant to his points and
arrangement and organize the answers in whichever way he wishes. So they are also called
free response questions. This enables the teacher to judge the student’s abilities to organize,
integrate, interpret the material and express themselves in their own words. It also gives an
opportunity to comment or look into students’ progress, quality of their thinking, the depth of
their understanding problem solving skills and the difficulties they may be having. These
skills interact with each other with the knowledge and understanding the problem requires.
Thus it is at the levels of synthesis and evaluation of writing skills that this type of questions
makes the greatest contribution. Example: 1) describe at length the defects of the present day
examination system in the state of Maharashtra. Suggest ways and means of improving the
examination system. 2) describe the character of hamlet. 3) global warming is the next step to
disaster.
Reference :
Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological
Bulletin.
Inc.

CHAPTER IV
NON TEST ASSESSMENT
- be able explain the non test assessment
- be able to describe kinds of non test assessment
Non test is an instrument other than academic achievement test. Item writing
procedures for non-test instruments is the same as the procedure of writing tespada learning
achievement test. Construct the lattice test, write items according to the lattice, review,
validation grains, grain testing, grain refinement based on the results of trials.
4.1. Observation
Should follow an established plan or checklist organized around concrete, objective
data. Observation needs to be tied to the objectives of the course. By observation Teachers
can assess their students' abilities simply by observing their classroom behavior or completion
of activities. By watching students as they work, teachers can identify signs of struggle and
determine where a child may be experiencing academic difficulties. Because students often do
not realize that they are being observed, teachers can ensure that the picture they receive of
student understanding represents the student's actual abilities. For most practitioners
observation is a feature of everyday working life and practitioners can often be found with a
notebook and pen close to hand to jot down unplanned observations that can be added to
normal recording systems at a later time. However, as previously discussed, specific
observations should be planned. Prior to beginning the observation practitioners should work
through the stages outlined in the previous section and, as a part of this process, the most
appropriate observational method should be selected from the range available. It will also be
helpful to produce a cover sheet including such details as:

• child’s name
• child’s age
• date
• name of observer
• the specific setting or area of setting
• permissions gained
• aims and purpose of observation
• start and finish times.
4.2. Interview
Interviews are the most frequently used method of personnel selection, but also are
used for school admissions, promotions, scholarships, and other awards. Interviews vary in
their content and structure. In a structured interview, questions are prepared before the
interview starts. An unstructured interview simply represents a free conversation between an
interviewer and interviewee, giving the interviewer the freedom to adaptively or intuitively
switch topics. Research has shown that unstructured interviews lack predictive validity53 or
show lower predictive validity than structured interviews. The best practices for conducting
interviews are:
• High degree of structure
• Selection of questions according to job requirements
• Assessment of aspects that cannot be better assessed with other methods
• Scoring with pre-tested, behavior-anchored rating scales
• Empirical examination of each question
• Rating only after the interview
• Standardized scoring
• Training of interviewers
Structured interviews can be divided into three types:
a. Behavioral description interview involves questions that refer to past behavior in real
situations, also referred to as job-related interview.
b. Situational interview uses questions that require interviewees to imagine hypothetical
situations (derived from critical incidents) and state how they would act in such
situations.
c. Multimodal interview combines the two approaches above and adds unstructured parts
to ensure high respondent acceptance.

Analyses of predictive validity of interviews for job performance have shown that they
are good predictors of job performance, add incremental validity above and beyond general
mental ability, and that behavioral description interviews show a higher validity than
situational interviews. Interviews are less predictive of academic performance as compared to
job-related outcomes. Predictive validity probably also depends on the content of the
interview, but the analyses aggregated interviews with different contents.
4.3. Questionnaire
Questionnaires are the most commonly used method for collecting information from
program participants when evaluating educational and extension programs. There are nine
steps involved in the development of a questionnaire:
1. Decide the information required.
2. Define the target respondents.
3. Choose the method(s) of reaching your target respondents.
4. Decide on question content.
5. Develop the question wording.
6. Put questions into a meaningful order and format.
7. Check the length of the questionnaire.
8. Pre-test the questionnaire.
9. Develop the final survey form.
4.4. Portofolios
Portofolios is a collection of student work with a common theme or purpose. Like a
photographers portfolio they should contain the best examples of all of their work. For
subjects that are paper-based, the collection of a portfolio is simple. Homework is a structured
practiced exercise that usually place a part in grading. Sometimes instructors assign reading or
other homework which covers the theoretical aspects the subject matter, so that the class time
can be used for more hands on practical work. In a portfolio assessment, a teacher looks not at
one piece of work as a measure of student understanding, but instead at the body of work the
student has produced over a period of time. To allow for a portfolio assessment, a teacher
must compile student work throughout the term. This is commonly accomplished by
providing each student with a folder in which to store essays or other large activities. Upon

compilation of the portfolio, the teacher can review the body of work and determine the
degree to which the work indicates the student's understanding of the content.
Advantages of Portfolio Assessment
• Assesses what students can do and not just what they know.
• Engages students actively.
• Fosters student-teacher communication and depth of exploration.
• Enhances understanding of the educational process among parents and in the community.
• Provides goals for student learning.
• Offers an alternative to traditional tests for students with special needs.
The use of the portfolio as an assessment tool is a process with multiple steps. The
process takes time, and all of the component parts must be in place before the assessment can
be utilized effectively.
a. Decide on a purpose or theme. General assessment alone is not a sufficient goal for a
portfolio. It must be decided specifically what is to be assessed. Portfolios are most useful
for addressing the student’s ability to apply what has been learned. Therefore, a useful
question to consider is, What skills or techniques do I want the students to learn to apply?
The answer to this question can often be found in the school curriculum.
b. Consider what samples. Consider what samples of student work might best illustrate the
application of the standard or educational goal in question. Written work samples, of
course, come to mind. However, videotapes, pictures of products or activities, and
testimonials are only a few of the many different ways to document achievement.
c. Determine how samples will be selected. A range of procedures can be utilized here.
Students, maybe in conjunction with parents and teachers, might select work to be
included, or a specific type of sample might be required by the teacher, the school, or the
school system.
d. Decide whether to assess the process and the product or the product only. Assessing the
process would require some documentation regarding how the learner developed the
product. For example, did the student use the process for planning a short story or
utilizing the experimental method that was taught in class? Was it used correctly?
Evaluation of the process will require a procedure for accurately documenting the process
used. The documentation could include a log or video of the steps or an interview with
the student. Usually, if both the process and the product are to be evaluated, a separate
scoring system will have to be developed for each.

e. Develop an appropriate scoring system. Usually this is best done through the use of a
rubric, a point scale with descriptors that explain how the work will be evaluated. Points
are allotted with the highest quality work getting the most points. If the descriptors are
clear and specific, they become goals for which the student can aim. There should be a
separate scale for each standard being evaluated. For example, if one standard being
assessed is the use of grammatically correct sentence structure, five points might be
allotted if all sentences are grammatically correct. Then, a specific number of errors
would be identified for all other points with zero points given if there are more than a
certain number of errors. It is important that the standards for evaluation be carefully
explained. If we evaluate for clarity of writing, then an operational description of what is
meant by clarity should be provided. Points available should be small enough to be
practical and meaningful; an allotment of 20 points for clarity is not workable because an
evaluator cannot really distinguish between a 17- and an 18-point product with regard to
clarity.
f. Share the scoring system with the students. Qualitative descriptors of how the student
will be evaluated, known in advance, can guide learning and performance.
g. Engage the learner in a discussion of the product. Through the process of discussion the
teacher and the learner can explore the material in more depth, exchange feelings and
attitudes with regard to the product and the learning process, and reap the greatest
advantage of effective portfolio implementation.
4.5. Case studies and problem solving assignment can be used to apply knowledge. This type
of assignment required the student to place him or herself in or react to a situation where
their prior learning is needed to solve the problem or evaluate the situation. Cas studies
should be realistic and practical with clear instruction.
4.6. Project
Project are usually designed so that the students can apply many of the skills they have
developed in the course by producing a product of some kind. Usually project assignments
are given early in the course with a completion date toward the end of the quarter. By asking
students to complete a project, teachers can see how well their pupils can apply taught
information. Successful completion of a project requires a student to translate their learning
into the completion of a task. Project-based assessment more closely approximates how
students will be assessed in the real world, as employers will not ask their employees to take

tests, but instead judge their merit upon the work they complete. Project is the example of
performance task.
Reference :
Arvey, R. D., & Campion, J. E. (1982). The employment interview: A summary and review of
recent research. Personnel Psychology.
Bulletin.
Damiani, V. B. 2004. Portofolio Asssessment in The Classroom. National Association of
School Psycologists.
Janz, T., Hellervik, L., & Gilmore, D. C. (1986). Behavior Description Interviewing (BDI).
Boston: Allyn & Bacon.
Latham, G. P., Saari, L. M., Pursell, E. D., & Campion, M. A. (1980). The situational
interview (SI). Journal of Applied Psychology.
Inc.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in
personnel psychology. Psychological Bulletin.
Schuler, H. (2002). Das Einstellungsinterview (Multimodales Interview [MMI]). Göttingen:
Hogrefe.

CAPTER 5
VALIDITY TEST
Purpose:
- Be able to explain the definition of validity test
- Be able to explain the function of validity test
- Be able to test of validity
Test validity is the extent to which a test (such as a chemical, physical, or scholastic test)
accurately measures what it purports to measure. Validity divided into various kinds such as
content validity, criterion validity, and construct validity.
5. 1. Content Validity
Content validity is the estimate of how much a measure represents every single
element of a construct. Content validity is The extent to which the content of the test matches
the instructional objectives. The example A semester or quarter exam that only includes
content covered during the last six weeks is not a valid measure of the course's overall
objectives. It has very low content validity.
5. 2. Criterion Validity
Criterion Validity assesses whether a test reflects a certain set of abilities. If the
criterion is obtained some titTIC after the test is given, he is studying predictive validity. If

the test score and criterion score are detertnined at essentially the sanle time, he is studying
concurrent validity.
Concurrent validity measures the test against a benchmark test and high correlation
indicates that the test has strong criterion validity. In concurrent validity, we assess the
operationalization's ability to distinguish between groups that it should theoretically be able to
distinguish between. For example, if we come up with a way of assessing manic-depression,
our measure should be able to distinguish between people who are diagnosed manic-depression
and those diagnosed paranoid schizophrenic. If we want to assess the concurrent
validity of a new measure of empowerment, we might give the measure to both migrant farm
workers and to the farm owners, theorizing that our measure should show that the farm
owners are higher in empowerment. As in any discriminating test, the results are more
powerful if you are able to show that you can discriminate between two groups that are very
similar. If the end-of-year math tests in 4th grade correlate highly with the statewide math
tests, they would have high concurrent validity.
Predictive validity is a measure of how well a test predicts abilities. It involves testing
a group of subjects for a certain construct and then comparing them with results obtained at
some point in the future. In predictive validity, we assess the operationalization's ability to
predict something it should theoretically be able to predict. For instance, we might theorize
that a measure of math ability should be able to predict how well a person will do in an
engineering-based profession. We could give our measure to experienced engineers and see if
there is a high correlation between scores on the measure and their salaries as engineers. A
high correlation would provide evidence for predictive validity -- it would show that our
measure can correctly predict something that we theoretically think it should be able to
predict.
5. 3. Construct Validity
Construct validity is an assessment of how well you translated your ideas or theories
into actual programs or measures. Construct validity defines how well a test or experiment
measures up to its claims. A test designed to measure depression must only measure that
particular construct, not closely related ideals such as anxiety or stress. Construct validity
refers to the degree to the which inferences can legitimately be made from the
operationalizations in your study to the theoretical constructs on roomates Reviews those
operationalizations were based. Like external validity, construct validity is related to
generalizing . But , where external validity Involves generalizing from context to guide the

study of people, places or times, the construct validity Involves generalizing from your
program or measures to the concept of your program or measures.
Convergent validity tests that constructs that are expected to be related are, in fact,
related. In convergent validity, we examine the degree to which the operationalization is
similar to (converges on) other operationalizations that it theoretically should be similar to.
For instance, to show the convergent validity of a Head Start program, we might gather
evidence that shows that the program is similar to other Head Start programs. Or, to show the
convergent validity of a test of arithmetic skills, we might correlate the scores on our test
with scores on other tests that purport to measure basic math ability, where high correlations
would be evidence of convergent validity.
Discriminant validity tests that constructs that should have no relationship do, in fact,
not have any relationship. (also referred to as divergent validity). In discriminant validity, we
examine the degree to which the operationalization is not similar to (diverges from) other
operationalizations that it theoretically should be not be similar to. For instance, to show the
discriminant validity of a Head Start program, we might gather evidence that shows that the
program is not similar to other early childhood programs that don't label themselves as Head
Start programs. Or, to show the discriminant validity of a test of arithmetic skills, we might
correlate the scores on our test with scores on tests that of verbal ability, where low
correlations would be evidence of discriminant validity.
Reference :
Bulletin.
Inc.

CHAPTER VI
RELIABILITY TEST
Purpose:
- Be able to explain the definition of reliability test
- Be able to explain the function of reliability test
- Be able to test of reliability
Reliability relates to the consistency of an assessment. Reliability is a necessary but
not sufficient condition for validity. For instance, if the needle of the scale is five pounds
away from zero I always over report my weight by five pounds. The measurement consistent
but it is consistenly wrong, the measurement not valid. A reliable assessment is one that
consistently achieves the same results with the same (or similar) cohort of students. Various
factors affect reliability including ambiguous questions, too many options within a question
paper, vague marking instructions and poorly trained markers. Reliability testing methods can
be divided into two as external consistency and as internal consistency.
6.1. External Consistency Reliability
Reliability as an external consistency considers that the test said to be reliable if after
having tested several times will give relatively consistent results. Test methods included in
this method is the re-test method and parallel method.

Table 6.1 Test Re-test and Parallel Forms
No Method Prosedure Technic
1 Test Re-test The same tests were given as
much as two times to the same
students in different time
Correlation product moment
(between skor test 1 and test 2)
2 Parallel
Forms
Two similar tests / parallel
given to the same group of
learners
Correlation product moment
(between skor instrument test 1 dan
instrument test 2)
6.1.1. Test Re-test Reliability
Test Re-test reliability used to assess the consistency of a measure from one time to
another. Technique to measure the reliability of an achievement test by testing the same
achievement test repeatedly. The weakness of this method is that if the time interval is too
short then the second test enable learners still remember material diteskan so it is possible that
a second test result is better than the results of the first test.
The reliability coefficient in this case is simply the correlation between the scores
obtained by the same persons on the two administrations of the test. If the first test result has
parallels with the results of the second test, the test is said to be reliable. The analysis is done
by looking for correlations between the results of the first test and second test results. This is
done using the Pearson product-moment correlation coefficient (r). The value of "r" will
always fall within the range –1 to +1.
Example :
No Students name Score test 1 (X) Score test 2 (Y)
1 Agustina 78 80
2 Feby 80 85
3 Antoni 77 80
4 Chandra 90 85
5 Dionisius 70 75
6 Fitriani 73 78
etc
The formula:
Σ − Σ Σ
( )( )
N XY X Y
{ N X 2 ( X ) 2}{ N Y 2 ( Y
)
2} rXY
Σ − Σ Σ − Σ
=

Description:
N = number of students
X = score test 1
Y = score test 2
6.1.2. Parallel Forms Reliability
Parallel form reliability used to assess the consistency of the results of two tests
constructed in the same way from the same content domain. This method requires the
presence of two series of questions that have the same goals, level of difficulty, as well as
composition of matter, but because of different grains, in other words, two tests must be
parallel. Reliability coefficient obtained by correlating the results of the first test and second
test results.
Example :
No Students name
Result of
Instrument 1 (X)
Result of
Instrument 2 (Y)
1 Fransiska 78 80
2 Johnson 80 85
3 Leona 77 80
4 Ratya 90 85
5 Febriyanti 70 75
6 Karmila 73 78
etc
The formula:
Σ − Σ Σ
( )( )
N XY X Y
{ N Σ X 2 − ( Σ X ) 2}{ N Σ Y 2 − ( Σ
Y
)
2} rXY
=
Description:
N = number of students
X = score from result of instrument test 1
Y = score from result of instrument test 2

6.2. Internal Consistency Reliability
Reliability as an internal consistency of the view that the test said to be reliable if the
test item between consistent measurement results. Test-retest method and parallel form
reliability methods have the disadvantage that they are time consuming. In most cases the
researcher wants to estimate the reliability from a single administration of a test. This
requirement has led to the measuring of internal consistency, or homogeneity. Internal
consistency measures consistency within the tool. Several internal consistency methods exist.
All internal consistency measurements have one thing in common, namely that the
measurement is based on the results of a single measurement. In the present study Split-Half
technique and Cronbach's Alpha method were used to estimate the internal consistency
reliability. The statistical analysis for Split half reliability (Spearman and Brown formula and
Guttmann's formula) and Cronbach's Alpha reliability, SPSS 17 Statistical Software was used.
The calculation for the Split half reliability by Flanagan's formula MS-Excel software was
used.
6.2.1. Split-Half reliability method
In the Split-Half reliability method, the inventory was first divided into two
equivalent halves and the correlation coefficient between scores of these half-test was found.
This correlation coefficient denotes the reliability of the half test. The self correlation
coefficient of the whole test is estimated by different formulas. The measuring instrument can
be divided into two halves in a number of ways. But the best way to divide the measuring
instrument into two halves is to find the correlation coefficient between scores of odd
numbered and even numbered items. In the present study the correlation coefficient was
calculated by using following formulas:
a. Spearman and Brown Formula
The spearman and Brown formula was designed to estimate the reliability of a test n
times as long as the one for which we know a self correlation. From the reliability of the
half test, the self-correlation coefficient of the whole test is estimated by the following
Spearman and Brown formula:
Where,

rtt = reliability of a total test estimated from reliability of one of its halves (reliability
coefficient of the whole test)
rhh = self correlation of a half test (reliability coefficient of the half test)
b. Rulon/Guttmann's Formula
An alternate method for finding split-half reliability was developed by Rulon. It requires
only the variance of the differences between each person's scores on the two half-tests
and the variance of total scores. These two values are substituted in the following
formula, which yields the reliability of the whole test directly:
Where,
rtt = Reliability of the test
SDd = SD of difference of the scores
SDx = SD of the scores of whole test
c. Flanagan Formula
Flanagan gave a parallel formula for finding reliability using split half method.
Flanagan's Formula for reliability is described below:
Where,
rtt = Reliability of the test
SD1 = SD of the scores on 1st half
SD2 = SD of scores on 2nd half
SDt = SD of scores of whole test
6.2.2. Cronbach's Alpha method
Cronbach's Alpha is mathematically equivalent to the average of all possible split-half
estimates. A statistical analysis computer programme SPSSS 17 was used to calculate the
Cronbach's Alpha (a).

Reference :
Bulletin.
Inc.
Curiculum vitae
Kadek Ayu Astiti, S. Pd., M.Pd. born in Singaraja, September 28, 1988.
She is the second child of the couple and Ni Ketut Sudi Made Suarsini.
Website address: www.kadekayuastiti.blogspot.com . History of
education: elementary school No. 6 Kampung Baru Singaraja-Bali, SMP
Negeri 3 Singaraja-Bali, SMA N 1 Singaraja-Bali, S1 Physical Education
at Ganesha University of Education, Science Education S2 at Ganesha
University of Education. Employment history: laboratory in SMP N 1
Singaraja-Bali (2010-2011) , lecturer in SMP N 1 Singaraja-Bali (2011-2013 ), lecturer of
physical education courses at the University of Nusa Cendana (2014-present)

teaching material

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to teaching material

Similar to teaching material (20)

teaching material