SlideShare a Scribd company logo
1 of 44
Download to read offline
Teaching material 
PHYSICS EDUCATIONAL ASSESMENT 
BY: 
KADEK AYU ASTITI, S. PD., M. PD. 
NIP. 20140928 201404 2 002 
Support by: 
Dana PGMIPAU Tahun 2014 
PHYSICS EDUCATION PROGRAM 
MATHEMATIC AND SCIENCE DEPARTMENT 
FACULTY OF TEACHER TRAINING AND EDUCATION 
NUSA CENDANA UNIVERSITY 
2014
PREFACE 
A notable concern of many teachers is that they frequently have the task of 
constructing assessment to reflect on learning but have relatively little training or information 
to rely on in this task. Assessment so important in teaching and learning. Assessment is very 
important for our students, because it shows them where they are falling short. That it why 
teachers should always discuss exams with students afterwards, to show them what the right 
answers were, and where they made mistakes. For the same reason, students must be given 
their marks, and their exam scripts, as soon as possible. Assessment for Learning focuses on 
the opportunities to develop students' ability to evaluate themselves, to make judgements 
about their own performance and improve upon it. It makes use of authentic assessment 
methods and offers lots of opportunities for students to develop their skills through formative 
assessment using summative assessment sparingly. To do an effectively assessment, so 
teacher must be understand the type of assessment, type of scale assessment, method of 
construct the test, validity and reliability tests. Each aspect is discussed in this sourcesbook. 
To help the teacher do assessing, so part one contains information the meaning and the 
type of assessment. Concerning general test construction and introduces the six levels of 
intellectual understanding: knowledge, comprehension, application, analysis, synthesis, and 
evaluation. These levels of understanding assist in categorizing test questions, with 
knowledge as the lowest level. Part Two of the information sourcebook is devoted to actual 
test question construction, test of validity and reliability. Five test item types are discussed: 
multiple choice, true-false, matching, completion, and essay. Information covers the 
appropriate use of each item type, advantages and disadvantages of each item type, and 
characteristics of well written items. Suggestions for addressing higher order thinking skills 
for each item type are also presented. This sourcebook was developed to accomplish three 
outcomes: 1) Teachers will know the meaning and follow appropriate principles for 
developing and using assessment methods in their teaching, avoiding common pitfalls in 
student assessment, 2) Teachers will be able to identify and accommodate the limitations of 
different informal and formal assessment methods. 3) Teachers will gain an awareness that 
certain assessment approaches can be incompatible with certain instructional goals. 
Kadek Ayu Astiti, S. Pd., M. Pd.
Contens 
Preface i 
CHAPTER I TYPE OF ASSESSMENT 1 
1.1. Difference measurement, assessment and evaluation 
1.2 General Type of Assessment 
1.3 Norm Referenced Assessment and Criterion Referenced 
Assessment 
CHAPTER II EVALUATION OF LEARNING OBJECTS 
2.1 Cognitive Learning Outcomes 
2.2 Affective Learning Outcomes 
2.3 Psychomotor Learning Outcomes 
2.4 The Type Value Scale 
CHAPTER III LEARNING ASSESSMENT 
3.1 Objective 
3.2 Essay 
CHAPTER IV NON TEST ASSESSMENT 
4.1 Observation 
4.2 Interview 
4.3 Questionnare 
4.4 Portofolios 
4.5 Project 
CHAPTER V Validity Test 
5.1 Content Validity 
5.2 Criterion Validity 
5.3 Construct Validity 
CHAPTER VI RELIABILITY TEST 
6.1 External Consistency Reliability
6.2 Internal Consistency Reliability 
CHAPTER I 
TYPES OF ASSESSMENT 
Purpose: After learning this matter, students are expected to: 
- Be able to explain the definition of assessment 
- Be able to explain the different between measurement, assessment, evaluation 
- Mention the types of assessment (summative and formative) 
- Understand the concept of criterion referenced test and norm referenced framework 
1.1. Difference measurement, assessment and evaluation 
There is a lot of confusion over these three terms as well as other terms associated 
with measurenment, assessment, and evaluation. The following is an understanding of each of 
these terms: 
Measurement, beyond its general definition, refers to the set of procedures and the principles 
for how to use the procedures in educational tests and assessments. Some of the derived 
scores, standard scores, etc. A measurement takes place when a “test” is given and a “score” 
is obtained. If the test collects quantitative data, the score is a number. If the test collects 
qualitative data, the score may be a phrase or word such as “excellent.” 
Assessment is a process by which information is obtained relative to some known objective 
or goal. As noted in my definition of test, an assessment may include a test, but also includes 
methods such as observations, interviews, behavior monitoring, etc. 
Evaluation: focuses on grades and may reflect classroom components other than course 
content and mastery level. Evaluation are procedures used to determine whether the subject 
(i.e. student) meets a preset criteria, such as qualifying for special education services. This 
uses assessment (remember that an assessment may be a test) to make a determination of 
qualification in accordance with a predetermined criteria.
For the purpose of schematic representation, the three concepts of evaluation, 
measurement and testing have traditionally been demonstrated in three concentric circles of 
varying sizes. This is the relationship among these concepts. 
Evaluation 
Assessment 
Measurement 
Figure 1.1 relationship measurement, assessment and evaluation 
Assessment plays a major role in how students learn, their motivation to learn, and how 
teachers teach. Assessment is used for various purposes. 
• Assessment for learning: where assessment helps teachers gain insight into what 
students understand in order to plan and guide instruction, and provide helpful 
feedback to students. 
• Assessment as learning: where students develop an awareness of how they learn and 
use that awareness to adjust and advance their learning, taking an increased 
responsibility for their learning. 
• Assessment of learning: where assessment informs students, teachers and parents, as 
well as the broader educational community, of achievement at a certain point in time 
In order to celebrate success, plan interventions and support continued progress. 
Assessment must be planned with its purpose in mind. Assessment for, as and of learning all 
have a role to play in supporting and improving student learning, and must be appropriately 
balanced. The most important part of assessment is the interpretation and use of the 
information that is gleaned for its intended purpose. Assessment is embedded in the learning 
process. It is tightly interconnected with curriculum and instruction. As teachers and students
work towards the achievement of curriculum outcomes, assessment plays a constant role in 
informing instruction, guiding the student’s next steps, and checking progress and 
achievement. Teachers use many different processes and strategies for classroom assessment, 
and adapt them to suit the assessment purpose and needs of individual student. 
Table 1.1 Classroom assessment: from … to … 
No From To 
1 Classroom tests disconnected from the 
focus of instruction 
Classroom tests refecting the written and 
taught curriculum 
2 Assessment using only selected 
respons formats 
Assessment method selected intentionally to 
reflect specific kinds of learning target 
3 Mystery assessment, where students 
don’t know in advances what they are 
accountable for learning 
Transparency in assessments, where students 
know in advance what they will be held 
accountable for learning 
4 All assessment and assignments, 
including practice, “count” toward the 
grade 
Some assessment an assignment “count” 
toward the grade, others are for practice or 
other formative use 
5 Students as passive participant in the 
assessment process 
Students as active users of assessments as 
learning experiences 
6 Students not finding out until the 
graded event what they are good at 
and what they need to work on 
Students being able to identify theirs 
strengths and areas for futher study during 
learning 
1.2. General Type of Assessment 
1.2.1. Summative assessment 
Summative assessment are cumulative evaluation used to measure student growth 
after instruction and are generally given at the end of a course in order to determine wheter 
long term learning goals have been met. Summative assessment is assessments that provide 
evidence of student achievement for the purpose of making a judgment about student 
competence or program effectiveness. Typically the summative evaluation concentrates on 
learner outcome rather than only the program of instruction. It is means to determine a 
student’s mastery and understanding of information, skills, concept and process. Summative 
assessment occur at the end of a formal learning experience. Either a class or a program and 
may include a variety of activities example test, demonstration, portofolios, internship, 
clinical, and capstone project. Summative assement is a high stakes type of assessment for the 
purpose of the making final judgment about student achivment and instructional effectiveness.
By the time summative assessment occur student haved typically exit the learning mode. 
Teachers/schools can use these assessments to identify strengths and weaknesses of 
curriculum and instruction, with improvements affecting the next year's/term's students. 
Summative assessment are given periodically to determine at a particular point in time what 
students know and do not know. Many associate summative assessments only with 
standardized tests such as state assessments, but they are also used at and are an important 
part of district and classroom programs. Summative assessment at the district and classroom 
level is an accountability measure that is generally used as part of the grading process. The list 
is long, but here are some examples of summative assessments: 
a) State assessments 
b) District benchmark or interim assessments 
c) End-of-unit or chapter tests 
d) End-of-term or semester exams 
e) Scores that are used for accountability of schools (AYP) and students (report card grades). 
The key is to think of summative assessment as a means to gauge, at a particular point 
in time, student learning relative to content standards. Although the information gleaned from 
this type of assessment is important, it can only help in evaluating certain aspects of the 
learning process. Because they are spread out and occur after instruction every few weeks, 
months, or once a year, summative assessments are tools to help evaluate the effectiveness of 
programs, school improvement goals, alignment of curriculum, or student placement in 
specific programs. Summative assessments happen too far down the learning path to provide 
information at the classroom level and to make instructional adjustments and interventions 
during the learning process. It takes formative assessment to accomplish this. The goal of 
summative assessment is to evaluate student learning at the end of an instructional unit by 
comparing it against some standard or benchmark. Information from summative assessments 
can be used formatively when students or faculty use it to guide their efforts and activities in 
subsequent courses. 
1.2.2. Formative assessment 
Formative Assessment is part of the instructional process. Formative assessment is an 
integral part of teaching and learning. Formative assessment ongoing assessments, reviews, 
and observations in a classroom. Teachers use formative assessment to improve instructional 
methods and student feedback throughout the teaching and learning process. For example, if a 
teacher observes that some students do not grasp a concept, she or he can design a review 
activity or use a different instructional strategy. Likewise, students can monitor their progress
with periodic quizzes and performance tasks. The results of formative assessments are used to 
modify and validate instruction. Formative assessment occurs in the short term, as learners are 
in the process of making meaning of new content and of integrating it into what they already 
know. When in corporated into classroom practice, it providesthe information needed to 
adjust teaching and learning while they are happening. In this sense formative assessment 
informs both teachers and students about student understanding at a point when timely 
adjustment can be made. These adjustment help to ensure student achieve, targeted standards 
based learning goal within a set time frame. Although formative assessment strategies appear 
in a variety of formats, there are some distinct ways to distinguish them from summative 
assessments. Formative assessment helps teachers determine next steps during the learning 
process as the instruction approaches the summative assessment of student learning. 
Some of the instructional strategies that can be used formatively include the following: 
1. Criteria and goal setting with students engages them in instruction and the learning 
process by creating clear expectations. In order to be successful, students need to 
understand and know the learning target/goal and the criteria for reaching it. Establishing 
and defining quality work together, asking students to participate in establishing norm 
behaviors for classroom culture, and determining what should be included in criteria for 
success are all examples of this strategy. Using student work, classroom tests, or 
exemplars of what is expected helps students understand where they are, where they need 
to be, and an effective process for getting there. 
2. Observations go beyond walking around the room to see if students are on task or 
need clarification. Observations assist teachers in gathering evidence of student learning 
to inform instructional planning. This evidence can be recorded and used as feedback for 
students about their learning or as anecdotal data shared with them during conferences. 
3. Questioning strategies should be embedded in lesson/unit planning. Asking better 
questions allows an opportunity for deeper thinking and provides teachers with 
significant insight into the degree and depth of understanding. Questions of this nature 
engage students in classroom dialogue that both uncovers and expands learning. An “exit 
slip” at the end of a class period to determine students’ understanding of the day’s lesson 
or quick checks during instruction such as “thumbs up/down” or “red/green” (stop/go) 
cards are also examples of questioning strategies that elicit immediate information about 
student learning. Helping students ask better questions is another aspect of this formative 
assessment strategy. 
4. Self and peer assessment helps to create a learning community within a classroom. 
Students who can refect while engaged in metacognitive thinking are involved in their
learning. When students have been involved in criteria and goal setting, self-evaluation is 
a logical step in the learning process. With peer evaluation, students see each other as 
resources for understanding and checking for quality work against previously established 
criteria. 
5. Student record keeping helps students better understand their own learning as 
evidenced by their classroom work. This process of students keeping ongoing records 
of their work not only engages students, it also helps them, beyond a “grade,” to see 
where they started and the progress they are making toward the learning goal. All of these 
strategies are integral to the formative assessment process, and they have been suggested 
by models of effective middle school instruction. 
6. Balancing Assessment. As teachers gather information/data about student learning, 
several categories may be included. In order to better understand student learning, 
teachers need to consider information about the products (paper or otherwise) students 
create and tests they take, observational notes, and reflections on the communication that 
occurs between teacher and student or among students. When a comprehensive 
assessment program at the classroom level balances formative and summative student 
learning/achievement information, a clear picture emerges of where a student is relative 
to learning targets and standards. Students should be able to articulate this shared 
information about their own learning. When this happens, student-led conferences, a 
formative assessment strategy, are valid. The more we know about individual students as 
they engage in the learning process, the better we can adjust instruction to ensure that all 
students continue to achieve by moving forward in their learning. 
The goal of formative assessment is to monitor student learning to provide ongoing feedback 
that can be used by instructors to improve their teaching and by students to improve their 
learning. More specifically, formative assessments: 
• help students identify their strengths and weaknesses and target areas that need work 
• help faculty recognize where students are struggling and address problems 
immediately 
Formative assessments are generally low stakes, which means that they have low or no point 
value. Examples of formative assessments include asking students to: 
• draw a concept map in class to represent their understanding of a topic 
• submit one or two sentences identifying the main point of a lecture 
• turn in a research proposal for early feedback
1.3. Norm Referenced Assessment and Criterion Referenced Assessment 
When we look at the types of assessment instruments, we can generally classify them 
into two main groups: Criterion referenced assessments and norm-referenced assessments. 
1.3.1. Norm Referenced assessment 
Linn and Gronlund (2000) define norm referenced assessments in the following a test 
or other type of assessment designed to provide a measure of performance that is interpretable 
in terms of an individual's relative standing in some known group. Norm referenced tests 
allow us to compare a student's skills to others in his age group. Norm-referenced tests are 
developed by creating the test items and then administering the test to a group of students that 
will be used as the basis of comparison. 
The essential characteristic of norm referencing is that students are awarded their 
grades on the basis of their ranking within a particular cohort. Norm-referencing involves 
fitting a ranked list of students’ ‘raw scores’ to a pre-determined distribution for awarding 
grades. Usually, grades are spread to fit a ‘bell curve’ (a ‘normal distribution’ in statistical 
terminology), either by qualitative, informal rough-reckoning or by statistical techniques of 
varying complexity. For large student cohorts (such as in senior secondary education), 
statistical moderation processes are used to adjust or standardise student scores to fit a normal 
distribution. Norm referenced is standardized test compare students performance to that of a 
norming or sample group who are in the same grade or are the same age. Student performance 
is communicated in presentile ranks, grade aquivalent score, normal curve equivalents, scaled 
scores, or stanine scores. 
1.3.2. Criterion Referenced Assessment 
Criterion referenced is a students performance is easured against a standard. One form 
of criterion referenced assessment is the benchmark, a description of a key task that students 
are expected be perform. In contrast, criterion referencing assessment as the name implies, 
involves determining a student’s grade by comparing his or her achievements with clearly 
stated criteria for learning outcomes and clearly stated standards for particular levels of 
performance. Linn and Gronlund (2000) define criterion referenced assessments in the 
following a test or other type of assessment designed to provide a measure of performance 
that is interpretable in terms of a clearly defined and delimited domain of learning tasks.
Unlike norm-referencing, there is no pre-determined grade distribution to be generated and a 
student’s grades is in no way influenced by the performance of others. Theoretically, all 
students within a particular cohort could receive very high (or very low) grades depending 
solely on the levels of individuals’ performances against the established criteria and standards. 
The goal of criterion referencing is to report student achievement against objective reference 
points that are independent of the cohort being assessed. Criterion referencing can lead to 
simple pass fail grading schema, such as in determining fitness to practice in professional 
fields. Criterion referencing can also lead to reporting student achievement or progress on a 
series of key criteria rather than as a single grade or percentage. Criterion referencing is worth 
aspiring towards. Criterion referencing requires giving thought to expected learning 
outcomes: it is transparent for students, and the grades derived should be defensible in 
reasonably objective terms students should be able to trace their grades to the specifics of 
their performance on set tasks. Criterion referencing lays an important framework for student 
engagement with the learning process and its outcomes. 
The distinction between criterion and norm referenced assessments is criterion 
referencing compares one to a standard, norm referencing compares one to others. The 
following is a difference between norm referenced test and criterion referenced test adapted 
from Popham, (1975). 
Table 1.2. Difference Norm Referenced Test and Criterion Referenced Test 
Dimension 
Criterion Referenced 
Tests 
Norm Referenced 
Tests 
Purpose 
To determine whether each student 
has achieved specific skills or 
concepts. 
To find out how much students 
know before instruction begins and 
after it has finished. 
To rank each student with respect to 
the achievement of others in broad 
areas of knowledge. 
To discriminate between high and low 
achievers. 
Content 
Measures specific skills which 
make up a designated curriculum. 
These skills are identified by 
teachers and curriculum experts. 
Each skill is expressed as an 
instructional objective. 
Measures broad skill areas sampled 
from a variety of textbooks, syllabi, 
and the judgments of curriculum 
experts. 
Item 
Characteristics 
Each skill is tested by at least four 
items in order to obtain an adequate 
sample of student 
performance and to minimize the 
effect of guessing. 
Each skill is usually tested by less than 
four items. 
Items vary in difficulty.
The items which test any given 
skill are parallel in difficulty. 
Items are selected that discriminate 
between high and low achievers. 
Score 
Interpretation 
Each individual is compared with a 
preset standard for acceptable 
achievement. The performance of 
other examinees is irrelevant. 
A student's score is usually 
expressed as a percentage. 
Student achievement is reported for 
individual skills. 
Each individual is compared with other 
examinees and assigned a score-- 
usually expressed as a percentile, a 
grade equivalent 
score, or a stanine. 
Student achievement is reported for 
broad skill areas, although some norm-referenced 
tests do report student 
achievement for individual skills. 
Which of these methods is preferable? Mostly, students’ grades in universities are 
decided on a mix of both methods, even though there may not be an explicit policy to do so. 
In fact, the two methods are somewhat interdependent, more so than the brief explanations 
above might suggest. Logically, norm-referencing must rely on some initial criterion-referencing, 
since students’ ‘raw’ scores must presumably be determined in the first instance 
by assessors who have some objective criteria in mind. Criterion-referencing, on the other 
hand, appears more educationally defensible. But criterion-referencing may be very difficult, 
if not impossible, to implement in a pure form in many disciplines. It is not always possible to 
be entirely objective and to comprehensively articulate criteria for learning outcomes: some 
subjectivity in setting and interpreting levels of achievement is inevitable in higher education. 
This being the case, sometimes the best we can hope for is to compare individuals’ 
achievements relative to their peers. 
Norm-referencing, on its own and if strictly and narrowly implemented is undoubtedly 
unfair. With norm-referencing, a student’s grade depends to some extent at least not only on 
his or her level of achievement, but also on the achievement of other students. This might lead 
to obvious inequities if applied without thought to any other considerations. For example, a 
student who fails in one year may well have passed in other years! The potential for 
unfairness of this kind is most likely in smaller student cohorts, where norm-referencing may 
force a spread of grades and exaggerate differences in achievement. Alternatively, norm-referencing 
might artificially compress the range of difference that actually exists. 
Recognising, however, that some degree of subjectivity is inevitable in higher 
education, it is also worthwhile to monitor grade distributions – in other words, to use a 
modest process of norm-referencing to watch the outcomes of a predominantly criterion-referenced 
grading model. In doing so, if it is believed too many students are receiving low 
grades, or too many students are receiving high grades, or the distribution is in some way 
oddly spread, then this might suggest something is amiss and the assessment process needs
looking at. There may be, for instance, a problem with the overall degree of difficulty of the 
assessment tasks, for example: not enough challenging examination questions, or too few, or 
assignment tasks that fail to discriminate between students with differing levels of knowledge 
and skills. There might also be inconsistencies in the way different assessors are judging 
student work. Best practice in grading in higher education involves striking a balance between 
criterion referencing and norm-referencing. This balance should be strongly oriented towards 
criterion referencing as the primary and dominant principle. 
Reference: 
Bastanfar, A. 2009. Alternative in Assessment. Article. 
http://www3.telus.net/linguisticsissues/alternatives. 
Garrison, C. & Ehringhaus, M. 2010. Formative and Summative Assessment in The 
Classroom. www.measuredprogress. 
Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th ed.). 
Upper Saddle River, NJ: Prentice Hall. 
Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing 18 
(4) 351–372. 
Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, 
Inc.
CHAPTER II 
EVALUATION OF LEARNING OBJECTS 
Purpose: After learning this matter, student are expected to: 
- Understanding what is measured in cognitive aspects 
- Understanding what is measured in afectif aspects 
- Understanding what is measured in psikomotor aspects 
- Be able to explain the kinds of scale assessment 
2. 1. Cognitive Learning Outcomes 
One of the objects of evaluation result is the cognitive aspects of learning. The test 
questions will focus on appropriate intellectual activity ranging from simple recall to problem 
solving, crithical thinking and reasoning. Cognitive complexity refers to the various levels of 
learning that can be tested. A good test reflects the goals of the instruction. If the instructor is 
mainly concerned with students memorizing facts, the test should ask for simple recall of 
material. If the instructor is trying to develop analytic skills, a test that asks for recall is 
inappropriate and will cause students to conclude that memorization is the instructor's true 
goal. 
In 1956, after extensive research on educational goals, the group published its findings 
in a book edited by Dr. Benjamin S. Bloom, a Harvard professor. Bloom’s Taxonomy of 
Educational Objectives lists six levels of intellectual understanding: 
• Knowledge • analysis 
• Comprehension • syntesis 
• application • evaluation 
Table 2.1 Cognitive complexity adapated from Clay (2001)
Steps Explanation Example 
Knowledge Recognizing and recalling 
information, including dates, 
events, persons, places; terms, 
definitions; facts, principles, 
theories; methods and procedures 
Who invented the…? 
What is meant by…? 
Where is the…? 
Comprehension Understanding the meaning of 
information, including restating (in 
own words); translating from one 
form to another; or interpreting, 
explaining, and summarizing. 
Restate in your own words…? 
Convert fractions into…? 
List three reasons for…? 
Application Applying general rules, methods, or 
principles to a new situation, 
including classifying something as 
a specific example of a general 
principle or using a formula to 
solve a problem. 
How is...an example of... ? 
How is...related to... ? 
Why is...significant? 
Analysis Identifying the organization and 
patterns within a system by 
identifying its component parts and 
the relationships among the 
components. 
What are the parts of... ? 
Classify ...according to... 
Outline/diagram... 
Synthesis Discovering/creating new 
connections, generalizations, 
patterns, or perspectives; 
combining ideas to form a new 
whole. 
What would you infer from... ? 
What ideas can you add to... ? 
How would you create a... ? 
Evaluation Using evidence and reasoned 
argument to judge how well a 
proposal would accomplish a 
particular purpose; resolving 
controversies or differences of 
opinion. 
Do you agree…? 
How would you decide about... ? 
What priority would you give... ? 
2. 2. Affective Learning Outcomes 
Affective learning outcomes are learning outcomes related to their interests, attitudes 
and values. Affective learning outcomes developed by karthwohl, et al as outlined in his 
book: “Handbook II: The affective Domain”. According Karthwohl (in Mehren and 
Lehmann, 1973) affective domain consisit of: receiving, responding, valuting, organization, 
and Characteristing. 
Table 2.2: Affective domain guide adapted by Clay (2001)
Level If the student must Then use these key words in objectives, 
assignments and evaluations 
Receiving …receive information 
about or give attention 
to this new attitude, 
value or belief.? 
• be alert to 
• be aware of 
• be sensitive to 
• experience 
• listen to 
• look at 
• perceive existence 
• Receive information on 
• take notes on 
• take notice of 
• willingly attends 
Responding …participate in, or 
react to this new 
attitude, value or belief 
in a positive manner. 
• allow other to 
• answer questions 
on 
• contribute to 
• cooperate with 
dialog on 
• discuss openly 
• enjoy doing 
• participate in 
• reply to 
• respect those who 
Valuing …show some definite 
involvement in or 
commitment to this new 
attitude, value or belief 
• accept as right 
• accept as true 
• affirm belief/trust 
in 
• associate himself 
with 
• assume as true 
• consider valuable 
• decide based on 
• indicate agreement 
• influence others 
• justify based on 
• seek out more detail 
Organizing …integrate this new 
attitude, value or belief, 
with the existing 
organization of 
attitudes, values and 
beliefs, so that it has a 
position of priority and 
advocacy. 
• Advocate 
• integrate into life 
• judge based on 
• place in value 
system 
• prioritize based on 
• persuade others 
• systematize 
Characteristi 
ng 
…fully internalize this 
new attitude, value or 
belief so that it 
consistently 
characterizes thought 
and action. 
• act based on 
• consistently carry 
out 
• consistently 
practice 
• fully internalize 
• know by others as 
• characterized by 
• sacrifice for 
• view life based on 
2. 3. Psychomotor Learning Outcomes 
Psychomotor learning outcomes are learning outcomes related to motor skills and the 
ability to act individually. Pscychomotor behaviors are performed actions that are 
neuromuscular in nature and demand certain levels of physical dexterity. This assessment is 
suitable to assess the achievement of competence demanded of learners perform a specific 
task example: experiment in laboratorium. Taxonomy is often used is the taxonomy of the 
psychomotor learning outcomes Simpson (Gronlund and Linn, 1990. That taxonomi such as 
perception, set, guided response, mechanism, Complex Overt Response, adaptation, 
origination.
Tabel 2.3 Psychomotor Domain 
Category Description Examples of activity Action verbs 
Perception 
Awareness, the 
ability to use 
sensory cues to 
guide physical 
activity. The ability 
to use sensory cues 
to guide motor 
activity. This 
ranges from 
sensory 
stimulation, 
through cue 
selection, to 
translation. 
use and/or selection of senses 
to absorb data for guiding 
movement 
Examples: Detects non-verbal 
communication cues. 
Estimate where a ball will 
land after it is thrown and 
then moving to the correct 
location to catch the ball. 
Adjusts heat of stove to 
correct temperature by smell 
and taste of food. Adjusts the 
height of the forks on a 
forklift by comparing where 
the forks are in relation to the 
pallet. 
“By the end of the music 
theatre program, students will 
be able to relate types of 
music to particular dance 
steps.” 
chooses, describes, 
detects, differentiates, 
distinguishes, feels, 
hears, identifies, 
isolates, notices, 
recognizes, relates, 
selects, separates, 
touches, 
Set 
Readiness, a 
learner's readiness 
to act. Readiness to 
act. It includes 
mental, physical, 
and emotional sets. 
These three sets are 
dispositions that 
predetermine a 
person’s response 
to different 
situations 
(sometimes called 
mindsets). 
mental, physical or emotional 
preparation before experience 
or task 
Examples: Knows and acts 
upon a sequence of steps in a 
manufacturing process. 
Recognize one’s abilities and 
limitations. Shows desire to 
learn a new process 
(motivation). NOTE: This 
subdivision of Psychomotor is 
closely related with the 
"Responding to phenomena" 
subdivision of the Affective 
domain. 
“By the end of the physical 
education program, students 
will be able to demonstrate 
the proper stance for batting a 
ball.” 
arranges, begins, 
displays, explains, gets 
set, moves, prepares, 
proceeds, reacts, 
shows, states, 
volunteers, responds, 
starts, 
Guided 
Response 
Attempt. The early 
stages in learning a 
complex skill that 
includes imitation 
imitate or follow instruction, 
trial and error. 
Examples: Performs a 
mathematical equation as 
assembles, builds, 
calibrates, constructs, 
copies, dismantles, 
displays, dissects,
Category Description Examples of activity Action verbs 
and trial and error. 
Adequacy of 
performance is 
achieved by 
practicing. 
demonstrated. Follows 
instructions to build a model. 
Responds hand-signals of 
instructor while learning to 
operate a forklift. 
“By the end of the physical 
education program, students 
will be able to perform a golf 
swing as demonstrated by the 
instructor.” 
fastens, fixes, follows, 
grinds, heats, imitates, 
manipulates, measures, 
mends, mixes, reacts, 
reproduces, responds 
sketches, traces, tries. 
Mechanism 
basic proficiency, 
the ability to 
perform a complex 
motor skill. 
This is the 
intermediate stage 
in learning a 
complex skill. 
Learned responses 
have become 
habitual and the 
movements can be 
performed with 
some confidence 
and proficiency. 
competently respond to 
stimulus for action 
Examples: Use a personal 
computer. Repair a leaking 
faucet. Drive a car. 
“By the end of the biology 
program, students will be able 
to assemble laboratory 
equipment appropriate for 
experiments.” 
assembles, builds, 
calibrates, completes, 
constructs, dismantles, 
displays, fastens, fixes, 
grinds, heats, makes, 
manipulates, measures, 
mends, mixes, 
organizes, performs, 
shapes, sketches. 
Complex 
Overt 
Response 
expert proficiency, 
the intermediate 
stage of learning a 
complex skill. 
The skillful 
performance of 
motor acts that 
involve complex 
movement patterns. 
Proficiency is 
indicated by a 
quick, accurate, 
and highly 
coordinated 
performance, 
requiring a 
minimum of 
energy. This 
category includes 
performing without 
hesitation, and 
automatic 
performance. For 
example, players 
Execute a complex process 
with expertise 
Examples: Maneuvers a car 
into a tight parallel parking 
spot. Operates a computer 
quickly and accurately. 
Displays competence while 
playing the piano. 
“By the end of the industrial 
education program, students 
will be able to demonstrate 
proper use of woodworking 
tools to high school students.” 
assembles, builds, 
calibrates, constructs, 
coordinates, 
demonstrates, 
dismantles, displays, 
dissects, fastens, fixes, 
grinds, heats, 
manipulates, measures, 
mends, mixes, 
organizes, sketches. 
NOTE: The key words 
are the same as 
Mechanism, but will 
have adverbs or 
adjectives that indicate 
that the performance is 
quicker, better, more 
accurate, etc.
Category Description Examples of activity Action verbs 
are often utter 
sounds of 
satisfaction or 
expletives as soon 
as they hit a tennis 
ball or throw a 
football, because 
they can tell by the 
feel of the act what 
the result will 
produce. 
Adaptation 
adaptable 
proficiency, a 
learner's ability to 
modify motor skills 
to fit a new 
situation. 
Skills are well 
developed and the 
individual can 
modify movement 
patterns to fit 
special 
requirements. 
alter response to reliably meet 
varying challenges 
Examples: Responds 
effectively to unexpected 
experiences. Modifies 
instruction to meet the needs 
of the learners. Perform a task 
with a machine that it was not 
originally intended to do 
(machine is not damaged and 
there is no danger in 
performing the new task). 
“By the end of the industrial 
education program, students 
will be able to adapt their 
lessons on woodworking 
skills for disabled students.” 
adapts, adjusts, alters, 
changes, integrates, 
rearranges, reorganizes, 
revises, solves, varies. 
Origination 
creative 
proficiency, a 
learner's ability to 
create new 
movement patterns. 
Creating new 
movement patterns 
to fit a particular 
situation or specific 
problem. Learning 
outcomes 
emphasize 
creativity based 
upon highly 
developed skills. 
develop and execute new 
integrated responses and 
activities 
Examples: Constructs a new 
theory. Develops a new and 
comprehensive training 
programming. Creates a new 
gymnastic routine. 
arranges, builds, 
combines, composes, 
constructs, creates, 
designs, formulates, 
initiate, makes, 
modifies, originates, re-designs, 
trouble-shoots. 
2. 4. The Type of Value Scale
Scales of measurement refer to ways in which variables/numbers are defined and 
categorized. Each scale of measurement has certain properties which in turn determines the 
appropriateness for use of certain statistical analyses. There are four measurement scales (or 
types of data) such as nominal, ordinal, interval and ratio. 
2.4.1. Nominal value scale 
Nominal value scale is a scale that used to identifying object, individual, or group. In 
the quisioner that gave yes (1) answer or no (0) is the sample nominal value scale. The least 
like real number. Nominal basically refers to category discrete data such as name of your 
school, type of car you drive, classifying the gender, religion, menu items selected, etc. 
What is your gender? 
M – Male 
F - Female 
Which recreational activities 
do you participat in? 
1 – Hiking 
2 – Fishing 
3 – Boating 
4 – swimming 
5 - picniking 
a sub-type of nominal scale with only two categories (e.g. male/female) is called 
“dichotomous.” Nominal data can be clearly described in pie charts because they include 
clear categories that sup-up to 100%. 
2.4.2. Ordinal values scale 
Ordinal value scale is a scale that have a rank form. Ordinal is Scale for ordering 
observations from low to high with any ties attributed to lack of measurement sensitivity e.g. 
score from a questionnaire. For the example the first rank, second, and soon. In the questioner 
that have a likerts scale, use the ordinal value scale such as a disagree statement (1), doubt 
statement (2), and an agree statement (3). Ordinal refer to quantities that have a natural 
ordering. The ranking of favorit sports, the order of people placein a line, the order of runners 
finishing a race or more often the choice on a rating scale from 1 to 5. For the example: class 
ranks, social class categories, etc. 
Example: 
How statisfied are you 
with our service? 
1. very unsatisfied 
2. unsatisfied 
3. neutral 
4. unsatisfied 
5. very unsatisfied 
How do you feel today? 
1. very unhappy 
2. unhappy 
3. ok 
4. happy 
5. very happy 
What is your hair colour? 
1 – Black 
2 – brown 
3 – blonde 
4 – gray 
5 - other
2.4.3. Interval values scale 
Internal value scale is a same scale with nominal and and ordinal values scale, but it 
has a remain characteristics and be able to notate into the mathematics function. Scale with a 
fixed and defined interval. For the example how much a woman go to market (one, twice, etc) 
or final test score. Interval data is like ordinal except we can say the intervals between each 
value are equally split. The most common example is the temperature in degrees Fahrenheit. 
The difference between 29 and 30 degrees is the same magnitude as the difference between 78 
and 79. 
2.4.4. Ratio value scale 
Ratio value scale is a real value scale, have a same distance and be able to notate into 
the mathematics function. Ratio scales are the easiest to understand Because they are numbers 
as we usually think of them. The distance between adjacent numbers are equal on a ratio scale 
and the score of zero on the scale ratio means that there is none of whatever is being 
Measured. Most ratio scales are counts of things. Ratio data is interval data with a natural zero 
point. The example, weight the distance of street, time to complete a task, size of an object, 
etc. 
Reference : 
Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). 
Kansas State Department of Education 
Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. 
Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co 
Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. 
Inc. 
Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.
CHAPTER III 
TEST ASSESSMENT 
Purpose: After learning this matter, students are expected to: 
- be able to explain the test assessment 
- be able to describe kinds of test assessment 
- understanding advantages and disadvantages using the test 
Decision making 
Non testing Non 
tests 
measurement 
quantitative 
assessment by 
description 
Figure 3.1 alternative assessment, decision making in educational 
setting 
As Figure 3.1 shows, tests constitute only a small set of options, among a wide range 
of other options, for a language teacher to make decisions about students. The judgment 
emanating from a test is not necessarily more valid or reliable from the one deriving from 
qualitative procedures since both should meet reliability or validity criteria to be considered as 
informed decisions. The area circumscribed within quantitative decision-making is relatively
small and represents a specific choice made by the teacher at a particular time in the course 
while the vast area outside which covers all non-measurement qualitative assessment 
procedures represents the wider range of procedures and their general nature. This means that 
the qualitative approaches which result in descriptions of individuals, as contrasted to 
quantitative approaches which result in numbers, can go hand in hand with the teaching and 
learning experiences in the class and they can reveal more subtle shades of students’ 
proficiency. 
Test is method of measuring a persons ability, knowledge or performance to complete 
certain tasks or demonstrate mastery of a skill or knowledge of content. Test is a systematic 
procedure for observing persons and describing them with either a numerical scale or a 
category system. Thus test may give either a qualitative or quantitative information. Two type 
of test are objective test and essay test. Essay tests are appropriate when: 
• The group to be tested is small and the test is not to be reused. 
• You wish to encourage and reward the development of student skill in 
writing. 
• You are more interested in exploring the student’s attitudes than in 
measuring his/her achievement. 
Objective tests are appropriate when: 
• The group to be tested is large and the test may be reused. 
• Highly reliable scores must be obtained as efficiently as possible. 
• Impartiality of evaluation, fairness, and freedom from possible test 
scoring influences are essential. 
Either essay or objective tests can be used to 1) measure almost any important 
educational achievement, 2) a written test can measure, 3) test understanding and ability to 
apply principles, 4) test ability to think critically, 5) test ability to solve problems. 
3.1. Objective 
Objective tests measure both your ability to remember facts and figures and your 
understanding of course materials. These tests are often designed to make you think 
independently, so don't count on recognizing the right answer. Instead, prepare yourself for 
high level critical reasoning and making fine discriminations to determine the best answer. 
Taking an objective examination is somewhat different from taking an essay examination. 
The objective examination may be composed of true false, multiple choice, or matching 
responses. Also included occasionally is a fill in section. There are certain things that you 
must remember to do as you take this kind of test. First, roughly decide how to divide your
time. Quickly glance over the pages to see how many kinds of questions are being used and 
how many there are of each kind. Secondly, carefully read the instructions and make sure that 
you understand them before you begin to work. Indicate your answers exactly as specified in 
the instructions. If your instructor has not indicated whether there is a penalty for guessing, 
ask him or her about it; then, if there is a penalty, do not guess. 
3.1.1. Multiple choice 
Multiple choice is a test that has items formatted as multiple choice question, and the 
candidat must choose which answer or group of answers is correct. The multiple choice 
question consists of two parts: 1) the stem the statement or question, which identifies the 
question or problem and 2) the choices also known as the distracters. Usually, students are 
asked to select the one alternative that best completes a statement or answers a question. 
Multiple choice items can also provide an excellent basis for post test discussion, especially if 
the discussion addresses why the incorrect responses were wrong as well as why the correct 
responses were right. Unfortunately, multiple choice items are difficult and time consuming 
to construct well. They may also appear too discriminating (picky) to students, especially 
when the alternatives are well constructed and are open to misinterpretation by students who 
read more into questions than is there. Multiple choice tests can be used to test the ability to: 
1. Recall memorized information 
2. Apply theory to routine cases 
3. Apply theory to novel situations 
4. Use judgment in analyzing and evaluating 
Example of multiple choice: 
A three years old child can usually be expected to: 
a. Cry when separated from his or her mother 
b. Have imaginary friends 
c. Play with other children of the same age 
d. Constantly argue with order siblings 
3.1.2. True/false questions 
True/false question present candidates with a binary choice a statement is either true or 
false. This method presents problems as depending on the number of questions. True/false 
questions also a popular question type, the true false question has only two options. True false 
questions usually state the relation of two things to one another. Because the instructor is 
interested in knowing whether you know when and under what circumstances something is or
is not true, s/he usually includes some qualifiers in the statement. The qualifiers must be 
carefully considered. With the following qualifiers, you are wiser to guess "yes" if you don't 
know the answer because you may stand some chance of getting the answer right: most, some, 
usually, sometimes, and great. On the other hand, with these next qualifiers, you should guess 
"no" unless you are certain that the statement is true: all, no, always, is, never, is not, good, 
bad, equal, less. 
The following are advantages of true or false test: 
• Can test large amounts of content 
• Students can answer 3-4 questions per minute 
And the disadvantages are: 
• They are easy 
• It is difficult to discriminate between students that know the material and students who 
do not 
• Students have a 50-50 chance of getting the right answer by guessing 
• Need a large number of items for high reliability 
Example of true or false question: 
1. Electrons are larger than molecules. 
a. True b. false 
2. True or false? The study of plants is known as botany. 
a. True b. false 
3. TTrue or false? Is it recommended to take statements directly from the text to make 
good true-false questions? 
a. True b. false 
3.1.3. Matching Questions Type 
Matching question type is an items that provides a define term and requires a test taker 
to match identifying characteristic to the correct term. Matching questions give students some 
opportunity for guessing. Student must know the information well in that you are presented 
with two columns of items for which student must establish relationships. If only one match is 
allowed per item then once items become eliminated, a few of the latter ones may be guessed. 
Matching questions give stundent some opportunity for guessing. Student must know the 
information well in that you are presented with two columns of items for which students must 
establish relationships. If only one match is allowed per item then once items become 
eliminated, a few of the latter ones may be guessed. A simple matching item consists of two
columns: one column of stems or problems to be answered, and another column of responses 
from which the answers are to be chosen. Traditionally, the column of stems is placed on the 
left and the column of responses is placed on the right. 
Example: 
Directions: match the following! 
Water A. NaCl 
Discovered radium B. H2O 
Salt C. Fermi 
Ammonia D. NH3 
E. Curie 
3.1.4. Completions type 
Completions type is a filling in the blank item provides a test taker with identifying 
characteristic and requires the test taker to recall the correct term. Completion items are 
especially useful in assessing mastery of factual information when a specific word or phrase is 
important to know. There are two type of completion type such us the easier version provides 
a word bank of possible word that will fill in the blank. For some exams all words in the word 
bank are exactly once. If a teacher wanted to create a test of medium difficulty, they would 
provide a test with a word bank, but some words maybe used more than onces and others not 
at all. The hardest variety of such a test is a fill in the blank test in which no word bank is 
provided at all. This generally requires a higer level of understanding and memory than a 
multiple choice. Advantages: 
• Good for who, what, where, when content 
• Minimizes guessing 
• Encourages more intensive study. Student must know the answer vs. 
recognizing the answer. 
• Can usually provide an objective measure of student achievement or 
ability 
Disadvantages: 
• Difficult to assess higher levels of learning because the answers to 
completion items are usually limited to a few words 
• Difficult to construct so that the desired response is clearly indicated 
• May overemphasize memorization of facts 
• Questions may have more than one correct answer 
• Scoring is time consuming
A completion item requires the student to answer a question or to finish an incomplete 
statement by filling in a blank with the correct word or phrase. 
For example, 
A subatomic particle with a negative electric charge is called a(n) ____________. 
3.2. Essay 
Essay test is a test that requires the student to compose responses, usually lengthy up 
to several paragraphs. Essay test measure higer level thinking. A typical essay test usually 
consists of a small number of questions to which the student is expected to recall and organize 
knowledge in logical, integrated answers. Questions that test higher level processes such as: 
analysis, synthesis, evaluation, creativity. The distinctive feature of essay type tets is the 
freedom of response. Pupil are free to select, relates and present ideas in their own words. 
Items such us shorts answer or essay typically require a test taker to write a response to fulfill 
the requirenments of the item. In administrative term, essay items take less time to construct. 
As an assessment tool, essay items can test complex learning objectives as well as processes 
used to anser the questions. The items can also provide more realistic and generalize task for 
test. Finally, these items make it difficult for test takers to guess the correct answers and 
require test takers to demonstrate their writing skills as well as correct spelling and grammar. 
Uses of essay test: 
a. Assess the ability to recall, organize, and integrate ideas. 
b. Assess the ability to express one self in writing. 
c. Ability to supply information. 
d. Assess student understanding of subject matter. 
e. Measure the knowledge of factual information. 
The main advantages of essay and short answer items are that they permit students to 
demonstrate achievement of such higher level objectives as analyzing and critical thinking. 
Written items offer students the opportunity to use their own judgment, writing styles, and 
vocabularies. They are less time consuming to prepare than any other item type. Research 
indicates that students study more efficiently for essay type examinations than for selection 
(multiple choice) tests. Students preparing for essay tests focus on broad issues, general 
concepts, and interrelationships rather than on specific details. This studying results in 
somewhat better student performance regardless of the type of exam they are given. Essay
tests also give the instructor an opportunity to comment on students' progress, the quality of 
their thinking, the depth of their understanding, and the difficulties they may be having. 
The following are the advantages essay test: 
• Students less likely to guess 
• Easy to construct 
• Stimulates more study 
• Allows students to demonstrate ability to organize knowledge, express opinions, show 
originality. 
Disadvantages: 
• Can limit amount of material tested, therefore has decreased validity. 
• Subjective, potentially unreliable scoring. 
• Time consuming to score. 
Types of essay test: 
3.2.1. Restricted response 
The restricted response question usually limits both the content and the response the 
content is usually restricted by the scope of the topic to be discussed limitations on the form 
of response are generally indicated in the question another way of restricting responses in 
essay tests is to base the questions on specific problems. For this purpose, introductory 
material like that used in interpretive exercises can be presented. Such items differ from 
objective interpretive exercise only by the fact that essay questions are used instead of 
multiple choice or true or false items. Because the restricted response question is more 
structured it is most useful for measuring learning outcomes requiring the interpretation and 
application of date in a specific area. Example of restricted response: describe two situations 
that demonstrate the application of the law of supply and demand, state any five definition of 
education! 
Advantages of restricted response questions: 
• restricted response questions more structured 
• measure specific learning outcomes 
• provide more for more ease of assessment 
• any outcomes measured by an objective interpretive exercise can be measured by 
a restricted response questions 
3.2.2. Extended response
Extended response question allows student to select information that they think is 
pertinent, to organize the answer in accordance with their best judgment and to integrate and 
evaluate ideas as they think suitable. No restriction is placed in students as to the points he 
will discuss and the type of organization he will use. They do not set limits on the length or 
exact content to be discussed. 
Teachers in such a way so as to give students the maximum possible freedom to 
determine the nature and scope of question and in a way he would give response of course 
being related topic and in stipulated time frame these types of questions. The student may be 
select the points he thinks are most important, pertinent and relevant to his points and 
arrangement and organize the answers in whichever way he wishes. So they are also called 
free response questions. This enables the teacher to judge the student’s abilities to organize, 
integrate, interpret the material and express themselves in their own words. It also gives an 
opportunity to comment or look into students’ progress, quality of their thinking, the depth of 
their understanding problem solving skills and the difficulties they may be having. These 
skills interact with each other with the knowledge and understanding the problem requires. 
Thus it is at the levels of synthesis and evaluation of writing skills that this type of questions 
makes the greatest contribution. Example: 1) describe at length the defects of the present day 
examination system in the state of Maharashtra. Suggest ways and means of improving the 
examination system. 2) describe the character of hamlet. 3) global warming is the next step to 
disaster. 
Reference : 
Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). 
Kansas State Department of Education 
Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological 
Bulletin. 
Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. 
Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co 
Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. 
Inc. 
Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.
CHAPTER IV 
NON TEST ASSESSMENT 
Purpose: After learning this matter, students are expected to: 
- be able explain the non test assessment 
- be able to describe kinds of non test assessment 
Non test is an instrument other than academic achievement test. Item writing 
procedures for non-test instruments is the same as the procedure of writing tespada learning 
achievement test. Construct the lattice test, write items according to the lattice, review, 
validation grains, grain testing, grain refinement based on the results of trials. 
4.1. Observation 
Should follow an established plan or checklist organized around concrete, objective 
data. Observation needs to be tied to the objectives of the course. By observation Teachers 
can assess their students' abilities simply by observing their classroom behavior or completion 
of activities. By watching students as they work, teachers can identify signs of struggle and 
determine where a child may be experiencing academic difficulties. Because students often do 
not realize that they are being observed, teachers can ensure that the picture they receive of 
student understanding represents the student's actual abilities. For most practitioners 
observation is a feature of everyday working life and practitioners can often be found with a 
notebook and pen close to hand to jot down unplanned observations that can be added to 
normal recording systems at a later time. However, as previously discussed, specific 
observations should be planned. Prior to beginning the observation practitioners should work 
through the stages outlined in the previous section and, as a part of this process, the most 
appropriate observational method should be selected from the range available. It will also be 
helpful to produce a cover sheet including such details as:
• child’s name 
• child’s age 
• date 
• name of observer 
• the specific setting or area of setting 
• permissions gained 
• aims and purpose of observation 
• start and finish times. 
4.2. Interview 
Interviews are the most frequently used method of personnel selection, but also are 
used for school admissions, promotions, scholarships, and other awards. Interviews vary in 
their content and structure. In a structured interview, questions are prepared before the 
interview starts. An unstructured interview simply represents a free conversation between an 
interviewer and interviewee, giving the interviewer the freedom to adaptively or intuitively 
switch topics. Research has shown that unstructured interviews lack predictive validity53 or 
show lower predictive validity than structured interviews. The best practices for conducting 
interviews are: 
• High degree of structure 
• Selection of questions according to job requirements 
• Assessment of aspects that cannot be better assessed with other methods 
• Scoring with pre-tested, behavior-anchored rating scales 
• Empirical examination of each question 
• Rating only after the interview 
• Standardized scoring 
• Training of interviewers 
Structured interviews can be divided into three types: 
a. Behavioral description interview involves questions that refer to past behavior in real 
situations, also referred to as job-related interview. 
b. Situational interview uses questions that require interviewees to imagine hypothetical 
situations (derived from critical incidents) and state how they would act in such 
situations. 
c. Multimodal interview combines the two approaches above and adds unstructured parts 
to ensure high respondent acceptance.
Analyses of predictive validity of interviews for job performance have shown that they 
are good predictors of job performance, add incremental validity above and beyond general 
mental ability, and that behavioral description interviews show a higher validity than 
situational interviews. Interviews are less predictive of academic performance as compared to 
job-related outcomes. Predictive validity probably also depends on the content of the 
interview, but the analyses aggregated interviews with different contents. 
4.3. Questionnaire 
Questionnaires are the most commonly used method for collecting information from 
program participants when evaluating educational and extension programs. There are nine 
steps involved in the development of a questionnaire: 
1. Decide the information required. 
2. Define the target respondents. 
3. Choose the method(s) of reaching your target respondents. 
4. Decide on question content. 
5. Develop the question wording. 
6. Put questions into a meaningful order and format. 
7. Check the length of the questionnaire. 
8. Pre-test the questionnaire. 
9. Develop the final survey form. 
4.4. Portofolios 
Portofolios is a collection of student work with a common theme or purpose. Like a 
photographers portfolio they should contain the best examples of all of their work. For 
subjects that are paper-based, the collection of a portfolio is simple. Homework is a structured 
practiced exercise that usually place a part in grading. Sometimes instructors assign reading or 
other homework which covers the theoretical aspects the subject matter, so that the class time 
can be used for more hands on practical work. In a portfolio assessment, a teacher looks not at 
one piece of work as a measure of student understanding, but instead at the body of work the 
student has produced over a period of time. To allow for a portfolio assessment, a teacher 
must compile student work throughout the term. This is commonly accomplished by 
providing each student with a folder in which to store essays or other large activities. Upon
compilation of the portfolio, the teacher can review the body of work and determine the 
degree to which the work indicates the student's understanding of the content. 
Advantages of Portfolio Assessment 
• Assesses what students can do and not just what they know. 
• Engages students actively. 
• Fosters student-teacher communication and depth of exploration. 
• Enhances understanding of the educational process among parents and in the community. 
• Provides goals for student learning. 
• Offers an alternative to traditional tests for students with special needs. 
The use of the portfolio as an assessment tool is a process with multiple steps. The 
process takes time, and all of the component parts must be in place before the assessment can 
be utilized effectively. 
a. Decide on a purpose or theme. General assessment alone is not a sufficient goal for a 
portfolio. It must be decided specifically what is to be assessed. Portfolios are most useful 
for addressing the student’s ability to apply what has been learned. Therefore, a useful 
question to consider is, What skills or techniques do I want the students to learn to apply? 
The answer to this question can often be found in the school curriculum. 
b. Consider what samples. Consider what samples of student work might best illustrate the 
application of the standard or educational goal in question. Written work samples, of 
course, come to mind. However, videotapes, pictures of products or activities, and 
testimonials are only a few of the many different ways to document achievement. 
c. Determine how samples will be selected. A range of procedures can be utilized here. 
Students, maybe in conjunction with parents and teachers, might select work to be 
included, or a specific type of sample might be required by the teacher, the school, or the 
school system. 
d. Decide whether to assess the process and the product or the product only. Assessing the 
process would require some documentation regarding how the learner developed the 
product. For example, did the student use the process for planning a short story or 
utilizing the experimental method that was taught in class? Was it used correctly? 
Evaluation of the process will require a procedure for accurately documenting the process 
used. The documentation could include a log or video of the steps or an interview with 
the student. Usually, if both the process and the product are to be evaluated, a separate 
scoring system will have to be developed for each.
e. Develop an appropriate scoring system. Usually this is best done through the use of a 
rubric, a point scale with descriptors that explain how the work will be evaluated. Points 
are allotted with the highest quality work getting the most points. If the descriptors are 
clear and specific, they become goals for which the student can aim. There should be a 
separate scale for each standard being evaluated. For example, if one standard being 
assessed is the use of grammatically correct sentence structure, five points might be 
allotted if all sentences are grammatically correct. Then, a specific number of errors 
would be identified for all other points with zero points given if there are more than a 
certain number of errors. It is important that the standards for evaluation be carefully 
explained. If we evaluate for clarity of writing, then an operational description of what is 
meant by clarity should be provided. Points available should be small enough to be 
practical and meaningful; an allotment of 20 points for clarity is not workable because an 
evaluator cannot really distinguish between a 17- and an 18-point product with regard to 
clarity. 
f. Share the scoring system with the students. Qualitative descriptors of how the student 
will be evaluated, known in advance, can guide learning and performance. 
g. Engage the learner in a discussion of the product. Through the process of discussion the 
teacher and the learner can explore the material in more depth, exchange feelings and 
attitudes with regard to the product and the learning process, and reap the greatest 
advantage of effective portfolio implementation. 
4.5. Case studies and problem solving assignment can be used to apply knowledge. This type 
of assignment required the student to place him or herself in or react to a situation where 
their prior learning is needed to solve the problem or evaluate the situation. Cas studies 
should be realistic and practical with clear instruction. 
4.6. Project 
Project are usually designed so that the students can apply many of the skills they have 
developed in the course by producing a product of some kind. Usually project assignments 
are given early in the course with a completion date toward the end of the quarter. By asking 
students to complete a project, teachers can see how well their pupils can apply taught 
information. Successful completion of a project requires a student to translate their learning 
into the completion of a task. Project-based assessment more closely approximates how 
students will be assessed in the real world, as employers will not ask their employees to take
tests, but instead judge their merit upon the work they complete. Project is the example of 
performance task. 
Reference : 
Arvey, R. D., & Campion, J. E. (1982). The employment interview: A summary and review of 
recent research. Personnel Psychology. 
Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). 
Kansas State Department of Education 
Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological 
Bulletin. 
Damiani, V. B. 2004. Portofolio Asssessment in The Classroom. National Association of 
School Psycologists. 
Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. 
Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co 
Janz, T., Hellervik, L., & Gilmore, D. C. (1986). Behavior Description Interviewing (BDI). 
Boston: Allyn & Bacon. 
Latham, G. P., Saari, L. M., Pursell, E. D., & Campion, M. A. (1980). The situational 
interview (SI). Journal of Applied Psychology. 
Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. 
Inc. 
Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta. 
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in 
personnel psychology. Psychological Bulletin. 
Schuler, H. (2002). Das Einstellungsinterview (Multimodales Interview [MMI]). Göttingen: 
Hogrefe.
CAPTER 5 
VALIDITY TEST 
Purpose: 
- Be able to explain the definition of validity test 
- Be able to explain the function of validity test 
- Be able to test of validity 
Test validity is the extent to which a test (such as a chemical, physical, or scholastic test) 
accurately measures what it purports to measure. Validity divided into various kinds such as 
content validity, criterion validity, and construct validity. 
5. 1. Content Validity 
Content validity is the estimate of how much a measure represents every single 
element of a construct. Content validity is The extent to which the content of the test matches 
the instructional objectives. The example A semester or quarter exam that only includes 
content covered during the last six weeks is not a valid measure of the course's overall 
objectives. It has very low content validity. 
5. 2. Criterion Validity 
Criterion Validity assesses whether a test reflects a certain set of abilities. If the 
criterion is obtained some titTIC after the test is given, he is studying predictive validity. If
the test score and criterion score are detertnined at essentially the sanle time, he is studying 
concurrent validity. 
Concurrent validity measures the test against a benchmark test and high correlation 
indicates that the test has strong criterion validity. In concurrent validity, we assess the 
operationalization's ability to distinguish between groups that it should theoretically be able to 
distinguish between. For example, if we come up with a way of assessing manic-depression, 
our measure should be able to distinguish between people who are diagnosed manic-depression 
and those diagnosed paranoid schizophrenic. If we want to assess the concurrent 
validity of a new measure of empowerment, we might give the measure to both migrant farm 
workers and to the farm owners, theorizing that our measure should show that the farm 
owners are higher in empowerment. As in any discriminating test, the results are more 
powerful if you are able to show that you can discriminate between two groups that are very 
similar. If the end-of-year math tests in 4th grade correlate highly with the statewide math 
tests, they would have high concurrent validity. 
Predictive validity is a measure of how well a test predicts abilities. It involves testing 
a group of subjects for a certain construct and then comparing them with results obtained at 
some point in the future. In predictive validity, we assess the operationalization's ability to 
predict something it should theoretically be able to predict. For instance, we might theorize 
that a measure of math ability should be able to predict how well a person will do in an 
engineering-based profession. We could give our measure to experienced engineers and see if 
there is a high correlation between scores on the measure and their salaries as engineers. A 
high correlation would provide evidence for predictive validity -- it would show that our 
measure can correctly predict something that we theoretically think it should be able to 
predict. 
5. 3. Construct Validity 
Construct validity is an assessment of how well you translated your ideas or theories 
into actual programs or measures. Construct validity defines how well a test or experiment 
measures up to its claims. A test designed to measure depression must only measure that 
particular construct, not closely related ideals such as anxiety or stress. Construct validity 
refers to the degree to the which inferences can legitimately be made from the 
operationalizations in your study to the theoretical constructs on roomates Reviews those 
operationalizations were based. Like external validity, construct validity is related to 
generalizing . But , where external validity Involves generalizing from context to guide the
study of people, places or times, the construct validity Involves generalizing from your 
program or measures to the concept of your program or measures. 
Convergent validity tests that constructs that are expected to be related are, in fact, 
related. In convergent validity, we examine the degree to which the operationalization is 
similar to (converges on) other operationalizations that it theoretically should be similar to. 
For instance, to show the convergent validity of a Head Start program, we might gather 
evidence that shows that the program is similar to other Head Start programs. Or, to show the 
convergent validity of a test of arithmetic skills, we might correlate the scores on our test 
with scores on other tests that purport to measure basic math ability, where high correlations 
would be evidence of convergent validity. 
Discriminant validity tests that constructs that should have no relationship do, in fact, 
not have any relationship. (also referred to as divergent validity). In discriminant validity, we 
examine the degree to which the operationalization is not similar to (diverges from) other 
operationalizations that it theoretically should be not be similar to. For instance, to show the 
discriminant validity of a Head Start program, we might gather evidence that shows that the 
program is not similar to other early childhood programs that don't label themselves as Head 
Start programs. Or, to show the discriminant validity of a test of arithmetic skills, we might 
correlate the scores on our test with scores on tests that of verbal ability, where low 
correlations would be evidence of discriminant validity. 
Reference : 
Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). 
Kansas State Department of Education 
Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological 
Bulletin. 
Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. 
Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co 
Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. 
Inc. 
Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.
CHAPTER VI 
RELIABILITY TEST 
Purpose: 
- Be able to explain the definition of reliability test 
- Be able to explain the function of reliability test 
- Be able to test of reliability 
Reliability relates to the consistency of an assessment. Reliability is a necessary but 
not sufficient condition for validity. For instance, if the needle of the scale is five pounds 
away from zero I always over report my weight by five pounds. The measurement consistent 
but it is consistenly wrong, the measurement not valid. A reliable assessment is one that 
consistently achieves the same results with the same (or similar) cohort of students. Various 
factors affect reliability including ambiguous questions, too many options within a question 
paper, vague marking instructions and poorly trained markers. Reliability testing methods can 
be divided into two as external consistency and as internal consistency. 
6.1. External Consistency Reliability 
Reliability as an external consistency considers that the test said to be reliable if after 
having tested several times will give relatively consistent results. Test methods included in 
this method is the re-test method and parallel method.
Table 6.1 Test Re-test and Parallel Forms 
No Method Prosedure Technic 
1 Test Re-test The same tests were given as 
much as two times to the same 
students in different time 
Correlation product moment 
(between skor test 1 and test 2) 
2 Parallel 
Forms 
Two similar tests / parallel 
given to the same group of 
learners 
Correlation product moment 
(between skor instrument test 1 dan 
instrument test 2) 
6.1.1. Test Re-test Reliability 
Test Re-test reliability used to assess the consistency of a measure from one time to 
another. Technique to measure the reliability of an achievement test by testing the same 
achievement test repeatedly. The weakness of this method is that if the time interval is too 
short then the second test enable learners still remember material diteskan so it is possible that 
a second test result is better than the results of the first test. 
The reliability coefficient in this case is simply the correlation between the scores 
obtained by the same persons on the two administrations of the test. If the first test result has 
parallels with the results of the second test, the test is said to be reliable. The analysis is done 
by looking for correlations between the results of the first test and second test results. This is 
done using the Pearson product-moment correlation coefficient (r). The value of "r" will 
always fall within the range –1 to +1. 
Example : 
No Students name Score test 1 (X) Score test 2 (Y) 
1 Agustina 78 80 
2 Feby 80 85 
3 Antoni 77 80 
4 Chandra 90 85 
5 Dionisius 70 75 
6 Fitriani 73 78 
etc 
The formula: 
Σ − Σ Σ 
( )( ) 
N XY X Y 
{ N X 2 ( X ) 2}{ N Y 2 ( Y 
) 
2} rXY 
Σ − Σ Σ − Σ 
=
Description: 
N = number of students 
X = score test 1 
Y = score test 2 
6.1.2. Parallel Forms Reliability 
Parallel form reliability used to assess the consistency of the results of two tests 
constructed in the same way from the same content domain. This method requires the 
presence of two series of questions that have the same goals, level of difficulty, as well as 
composition of matter, but because of different grains, in other words, two tests must be 
parallel. Reliability coefficient obtained by correlating the results of the first test and second 
test results. 
Example : 
No Students name 
Result of 
Instrument 1 (X) 
Result of 
Instrument 2 (Y) 
1 Fransiska 78 80 
2 Johnson 80 85 
3 Leona 77 80 
4 Ratya 90 85 
5 Febriyanti 70 75 
6 Karmila 73 78 
etc 
The formula: 
Σ − Σ Σ 
( )( ) 
N XY X Y 
{ N Σ X 2 − ( Σ X ) 2}{ N Σ Y 2 − ( Σ 
Y 
) 
2} rXY 
= 
Description: 
N = number of students 
X = score from result of instrument test 1 
Y = score from result of instrument test 2
6.2. Internal Consistency Reliability 
Reliability as an internal consistency of the view that the test said to be reliable if the 
test item between consistent measurement results. Test-retest method and parallel form 
reliability methods have the disadvantage that they are time consuming. In most cases the 
researcher wants to estimate the reliability from a single administration of a test. This 
requirement has led to the measuring of internal consistency, or homogeneity. Internal 
consistency measures consistency within the tool. Several internal consistency methods exist. 
All internal consistency measurements have one thing in common, namely that the 
measurement is based on the results of a single measurement. In the present study Split-Half 
technique and Cronbach's Alpha method were used to estimate the internal consistency 
reliability. The statistical analysis for Split half reliability (Spearman and Brown formula and 
Guttmann's formula) and Cronbach's Alpha reliability, SPSS 17 Statistical Software was used. 
The calculation for the Split half reliability by Flanagan's formula MS-Excel software was 
used. 
6.2.1. Split-Half reliability method 
In the Split-Half reliability method, the inventory was first divided into two 
equivalent halves and the correlation coefficient between scores of these half-test was found. 
This correlation coefficient denotes the reliability of the half test. The self correlation 
coefficient of the whole test is estimated by different formulas. The measuring instrument can 
be divided into two halves in a number of ways. But the best way to divide the measuring 
instrument into two halves is to find the correlation coefficient between scores of odd 
numbered and even numbered items. In the present study the correlation coefficient was 
calculated by using following formulas: 
a. Spearman and Brown Formula 
The spearman and Brown formula was designed to estimate the reliability of a test n 
times as long as the one for which we know a self correlation. From the reliability of the 
half test, the self-correlation coefficient of the whole test is estimated by the following 
Spearman and Brown formula: 
Where,
rtt = reliability of a total test estimated from reliability of one of its halves (reliability 
coefficient of the whole test) 
rhh = self correlation of a half test (reliability coefficient of the half test) 
b. Rulon/Guttmann's Formula 
An alternate method for finding split-half reliability was developed by Rulon. It requires 
only the variance of the differences between each person's scores on the two half-tests 
and the variance of total scores. These two values are substituted in the following 
formula, which yields the reliability of the whole test directly: 
Where, 
rtt = Reliability of the test 
SDd = SD of difference of the scores 
SDx = SD of the scores of whole test 
c. Flanagan Formula 
Flanagan gave a parallel formula for finding reliability using split half method. 
Flanagan's Formula for reliability is described below: 
Where, 
rtt = Reliability of the test 
SD1 = SD of the scores on 1st half 
SD2 = SD of scores on 2nd half 
SDt = SD of scores of whole test 
6.2.2. Cronbach's Alpha method 
Cronbach's Alpha is mathematically equivalent to the average of all possible split-half 
estimates. A statistical analysis computer programme SPSSS 17 was used to calculate the 
Cronbach's Alpha (a).
Reference : 
Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). 
Kansas State Department of Education 
Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological 
Bulletin. 
Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. 
Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co 
Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. 
Inc. 
Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta. 
Curiculum vitae 
Kadek Ayu Astiti, S. Pd., M.Pd. born in Singaraja, September 28, 1988. 
She is the second child of the couple and Ni Ketut Sudi Made Suarsini. 
Website address: www.kadekayuastiti.blogspot.com . History of 
education: elementary school No. 6 Kampung Baru Singaraja-Bali, SMP 
Negeri 3 Singaraja-Bali, SMA N 1 Singaraja-Bali, S1 Physical Education 
at Ganesha University of Education, Science Education S2 at Ganesha 
University of Education. Employment history: laboratory in SMP N 1 
Singaraja-Bali (2010-2011) , lecturer in SMP N 1 Singaraja-Bali (2011-2013 ), lecturer of 
physical education courses at the University of Nusa Cendana (2014-present)

More Related Content

What's hot

New trends in evaluation v kamat
New trends in evaluation v kamatNew trends in evaluation v kamat
New trends in evaluation v kamatVasudha Kamat
 
Continuous Assessment
Continuous AssessmentContinuous Assessment
Continuous AssessmentManuel Reyes
 
Continuous Assessment System (CAS In Nepal)
Continuous Assessment System (CAS In Nepal)Continuous Assessment System (CAS In Nepal)
Continuous Assessment System (CAS In Nepal)Ravi Maharjan
 
Assessment Assumptions
Assessment AssumptionsAssessment Assumptions
Assessment AssumptionsJason Rhode
 
Continuous Assessment Component (CAC) of SEA
Continuous Assessment Component (CAC) of SEAContinuous Assessment Component (CAC) of SEA
Continuous Assessment Component (CAC) of SEAMoeEduTT
 
Examination and Evaluation
Examination and EvaluationExamination and Evaluation
Examination and Evaluationjagannath Dange
 
Understanding the concept of continuous and comprehensive evauation.
Understanding the concept of continuous and comprehensive evauation.Understanding the concept of continuous and comprehensive evauation.
Understanding the concept of continuous and comprehensive evauation.Sarvodaya Kanya Vidhyalaya
 
Assessment for learning
Assessment for learningAssessment for learning
Assessment for learningAtul Thakur
 
PHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of ScalesPHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of ScalesMillathina Puji Utami
 
The value of continuous assessment strategies in students’ learning of geogra...
The value of continuous assessment strategies in students’ learning of geogra...The value of continuous assessment strategies in students’ learning of geogra...
The value of continuous assessment strategies in students’ learning of geogra...Alexander Decker
 
Assessment of Student Learning
Assessment of Student LearningAssessment of Student Learning
Assessment of Student LearningDigiZen
 
The process and purpose of evaluation
The process and purpose of evaluationThe process and purpose of evaluation
The process and purpose of evaluationahmedabbas1121
 
teacher made test Vs standardized test
 teacher made test Vs standardized test teacher made test Vs standardized test
teacher made test Vs standardized testathiranandan
 
Assessment literacy for effective classroom-based assessment
Assessment literacy for effective classroom-based assessmentAssessment literacy for effective classroom-based assessment
Assessment literacy for effective classroom-based assessmentEddy White, Ph.D.
 

What's hot (20)

New trends in evaluation v kamat
New trends in evaluation v kamatNew trends in evaluation v kamat
New trends in evaluation v kamat
 
Assessment strategies
Assessment strategiesAssessment strategies
Assessment strategies
 
Continuous Assessment
Continuous AssessmentContinuous Assessment
Continuous Assessment
 
Pros and cons of school based assessment in india by dr. thanuja.k converted
Pros and cons of school based assessment in india by dr. thanuja.k convertedPros and cons of school based assessment in india by dr. thanuja.k converted
Pros and cons of school based assessment in india by dr. thanuja.k converted
 
Continuous Assessment System (CAS In Nepal)
Continuous Assessment System (CAS In Nepal)Continuous Assessment System (CAS In Nepal)
Continuous Assessment System (CAS In Nepal)
 
Assessment Assumptions
Assessment AssumptionsAssessment Assumptions
Assessment Assumptions
 
Concept of Classroom Assessment
Concept of Classroom AssessmentConcept of Classroom Assessment
Concept of Classroom Assessment
 
How, what and why of assessment, by dr. thanujakarimbana converted
How, what and why of assessment, by dr. thanujakarimbana convertedHow, what and why of assessment, by dr. thanujakarimbana converted
How, what and why of assessment, by dr. thanujakarimbana converted
 
Continuous Assessment Component (CAC) of SEA
Continuous Assessment Component (CAC) of SEAContinuous Assessment Component (CAC) of SEA
Continuous Assessment Component (CAC) of SEA
 
Examination and Evaluation
Examination and EvaluationExamination and Evaluation
Examination and Evaluation
 
Definition of Assessment,
Definition of Assessment,Definition of Assessment,
Definition of Assessment,
 
Understanding the concept of continuous and comprehensive evauation.
Understanding the concept of continuous and comprehensive evauation.Understanding the concept of continuous and comprehensive evauation.
Understanding the concept of continuous and comprehensive evauation.
 
Assessment for learning
Assessment for learningAssessment for learning
Assessment for learning
 
PHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of ScalesPHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
PHYSICS ASSESSMENT General Types of Assessment and The Types of Scales
 
The value of continuous assessment strategies in students’ learning of geogra...
The value of continuous assessment strategies in students’ learning of geogra...The value of continuous assessment strategies in students’ learning of geogra...
The value of continuous assessment strategies in students’ learning of geogra...
 
Assessment of Student Learning
Assessment of Student LearningAssessment of Student Learning
Assessment of Student Learning
 
Educational evaluation -a brief conceptual overview
Educational evaluation -a brief conceptual overviewEducational evaluation -a brief conceptual overview
Educational evaluation -a brief conceptual overview
 
The process and purpose of evaluation
The process and purpose of evaluationThe process and purpose of evaluation
The process and purpose of evaluation
 
teacher made test Vs standardized test
 teacher made test Vs standardized test teacher made test Vs standardized test
teacher made test Vs standardized test
 
Assessment literacy for effective classroom-based assessment
Assessment literacy for effective classroom-based assessmentAssessment literacy for effective classroom-based assessment
Assessment literacy for effective classroom-based assessment
 

Similar to teaching material

Test Development and Evaluation
Test Development and Evaluation Test Development and Evaluation
Test Development and Evaluation HennaAnsari
 
ASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).ppt
ASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).pptASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).ppt
ASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).pptOscarAncheta
 
What-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptxWhat-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptxANIOAYRochelleDaoaya
 
2015 PGDT 423 (1).pptx
2015 PGDT 423 (1).pptx2015 PGDT 423 (1).pptx
2015 PGDT 423 (1).pptxsolomon554003
 
Evaluation in Education
Evaluation in Education Evaluation in Education
Evaluation in Education HennaAnsari
 
Assessment and Evaluation
Assessment and EvaluationAssessment and Evaluation
Assessment and EvaluationSuresh Babu
 
Assessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi guptaAssessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi guptagoggigupta
 
K to 12 classroom assessment ppt
K to 12 classroom assessment pptK to 12 classroom assessment ppt
K to 12 classroom assessment pptCarlo Magno
 
Assessment for learning chapter 1 - copy-converted
Assessment for learning chapter 1  - copy-convertedAssessment for learning chapter 1  - copy-converted
Assessment for learning chapter 1 - copy-convertedgoggigupta
 
Ict and assessment of learning
Ict and assessment of learningIct and assessment of learning
Ict and assessment of learningerwin marlon sario
 
Chapter 8 reporting by group 6 (autosaved) (autosaved)
Chapter 8 reporting by group 6 (autosaved) (autosaved)Chapter 8 reporting by group 6 (autosaved) (autosaved)
Chapter 8 reporting by group 6 (autosaved) (autosaved)Christine Watts
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxMarjorie Malveda
 
K to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentation
K to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentationK to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentation
K to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentationChuckry Maunes
 
CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION
CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION
CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION Shisira Bania
 
Types of Assessment in Classroom
Types of Assessment in ClassroomTypes of Assessment in Classroom
Types of Assessment in ClassroomS. Raj Kumar
 

Similar to teaching material (20)

Unit 301 Essay
Unit 301 EssayUnit 301 Essay
Unit 301 Essay
 
Test Development and Evaluation
Test Development and Evaluation Test Development and Evaluation
Test Development and Evaluation
 
ASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).ppt
ASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).pptASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).ppt
ASSESSMENT IN LEARNING 1-LESSONS 1-4 (1).ppt
 
What-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptxWhat-is-Educational-Assessment.pptx
What-is-Educational-Assessment.pptx
 
Essay On Assessment For Learning
Essay On Assessment For LearningEssay On Assessment For Learning
Essay On Assessment For Learning
 
2015 PGDT 423 (1).pptx
2015 PGDT 423 (1).pptx2015 PGDT 423 (1).pptx
2015 PGDT 423 (1).pptx
 
Evaluation in Education
Evaluation in Education Evaluation in Education
Evaluation in Education
 
Assessment and Evaluation
Assessment and EvaluationAssessment and Evaluation
Assessment and Evaluation
 
Assessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi guptaAssessment for learning by Dr. Goggi gupta
Assessment for learning by Dr. Goggi gupta
 
Dup(01)portfolio (1)
Dup(01)portfolio (1)Dup(01)portfolio (1)
Dup(01)portfolio (1)
 
K to 12 classroom assessment ppt
K to 12 classroom assessment pptK to 12 classroom assessment ppt
K to 12 classroom assessment ppt
 
Types Of Assessment
Types Of AssessmentTypes Of Assessment
Types Of Assessment
 
Assessment for learning chapter 1 - copy-converted
Assessment for learning chapter 1  - copy-convertedAssessment for learning chapter 1  - copy-converted
Assessment for learning chapter 1 - copy-converted
 
Ict and assessment of learning
Ict and assessment of learningIct and assessment of learning
Ict and assessment of learning
 
language and literature assessment
language and literature assessmentlanguage and literature assessment
language and literature assessment
 
Chapter 8 reporting by group 6 (autosaved) (autosaved)
Chapter 8 reporting by group 6 (autosaved) (autosaved)Chapter 8 reporting by group 6 (autosaved) (autosaved)
Chapter 8 reporting by group 6 (autosaved) (autosaved)
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
 
K to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentation
K to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentationK to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentation
K to 12 Grading Sheet Deped Order No. 8 S. 2015 PPT presentation
 
CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION
CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION
CONTINIOUS AND COMPREHENSIVE ASSESSMENT IN EDUCATION
 
Types of Assessment in Classroom
Types of Assessment in ClassroomTypes of Assessment in Classroom
Types of Assessment in Classroom
 

teaching material

  • 1. Teaching material PHYSICS EDUCATIONAL ASSESMENT BY: KADEK AYU ASTITI, S. PD., M. PD. NIP. 20140928 201404 2 002 Support by: Dana PGMIPAU Tahun 2014 PHYSICS EDUCATION PROGRAM MATHEMATIC AND SCIENCE DEPARTMENT FACULTY OF TEACHER TRAINING AND EDUCATION NUSA CENDANA UNIVERSITY 2014
  • 2. PREFACE A notable concern of many teachers is that they frequently have the task of constructing assessment to reflect on learning but have relatively little training or information to rely on in this task. Assessment so important in teaching and learning. Assessment is very important for our students, because it shows them where they are falling short. That it why teachers should always discuss exams with students afterwards, to show them what the right answers were, and where they made mistakes. For the same reason, students must be given their marks, and their exam scripts, as soon as possible. Assessment for Learning focuses on the opportunities to develop students' ability to evaluate themselves, to make judgements about their own performance and improve upon it. It makes use of authentic assessment methods and offers lots of opportunities for students to develop their skills through formative assessment using summative assessment sparingly. To do an effectively assessment, so teacher must be understand the type of assessment, type of scale assessment, method of construct the test, validity and reliability tests. Each aspect is discussed in this sourcesbook. To help the teacher do assessing, so part one contains information the meaning and the type of assessment. Concerning general test construction and introduces the six levels of intellectual understanding: knowledge, comprehension, application, analysis, synthesis, and evaluation. These levels of understanding assist in categorizing test questions, with knowledge as the lowest level. Part Two of the information sourcebook is devoted to actual test question construction, test of validity and reliability. Five test item types are discussed: multiple choice, true-false, matching, completion, and essay. Information covers the appropriate use of each item type, advantages and disadvantages of each item type, and characteristics of well written items. Suggestions for addressing higher order thinking skills for each item type are also presented. This sourcebook was developed to accomplish three outcomes: 1) Teachers will know the meaning and follow appropriate principles for developing and using assessment methods in their teaching, avoiding common pitfalls in student assessment, 2) Teachers will be able to identify and accommodate the limitations of different informal and formal assessment methods. 3) Teachers will gain an awareness that certain assessment approaches can be incompatible with certain instructional goals. Kadek Ayu Astiti, S. Pd., M. Pd.
  • 3. Contens Preface i CHAPTER I TYPE OF ASSESSMENT 1 1.1. Difference measurement, assessment and evaluation 1.2 General Type of Assessment 1.3 Norm Referenced Assessment and Criterion Referenced Assessment CHAPTER II EVALUATION OF LEARNING OBJECTS 2.1 Cognitive Learning Outcomes 2.2 Affective Learning Outcomes 2.3 Psychomotor Learning Outcomes 2.4 The Type Value Scale CHAPTER III LEARNING ASSESSMENT 3.1 Objective 3.2 Essay CHAPTER IV NON TEST ASSESSMENT 4.1 Observation 4.2 Interview 4.3 Questionnare 4.4 Portofolios 4.5 Project CHAPTER V Validity Test 5.1 Content Validity 5.2 Criterion Validity 5.3 Construct Validity CHAPTER VI RELIABILITY TEST 6.1 External Consistency Reliability
  • 4. 6.2 Internal Consistency Reliability CHAPTER I TYPES OF ASSESSMENT Purpose: After learning this matter, students are expected to: - Be able to explain the definition of assessment - Be able to explain the different between measurement, assessment, evaluation - Mention the types of assessment (summative and formative) - Understand the concept of criterion referenced test and norm referenced framework 1.1. Difference measurement, assessment and evaluation There is a lot of confusion over these three terms as well as other terms associated with measurenment, assessment, and evaluation. The following is an understanding of each of these terms: Measurement, beyond its general definition, refers to the set of procedures and the principles for how to use the procedures in educational tests and assessments. Some of the derived scores, standard scores, etc. A measurement takes place when a “test” is given and a “score” is obtained. If the test collects quantitative data, the score is a number. If the test collects qualitative data, the score may be a phrase or word such as “excellent.” Assessment is a process by which information is obtained relative to some known objective or goal. As noted in my definition of test, an assessment may include a test, but also includes methods such as observations, interviews, behavior monitoring, etc. Evaluation: focuses on grades and may reflect classroom components other than course content and mastery level. Evaluation are procedures used to determine whether the subject (i.e. student) meets a preset criteria, such as qualifying for special education services. This uses assessment (remember that an assessment may be a test) to make a determination of qualification in accordance with a predetermined criteria.
  • 5. For the purpose of schematic representation, the three concepts of evaluation, measurement and testing have traditionally been demonstrated in three concentric circles of varying sizes. This is the relationship among these concepts. Evaluation Assessment Measurement Figure 1.1 relationship measurement, assessment and evaluation Assessment plays a major role in how students learn, their motivation to learn, and how teachers teach. Assessment is used for various purposes. • Assessment for learning: where assessment helps teachers gain insight into what students understand in order to plan and guide instruction, and provide helpful feedback to students. • Assessment as learning: where students develop an awareness of how they learn and use that awareness to adjust and advance their learning, taking an increased responsibility for their learning. • Assessment of learning: where assessment informs students, teachers and parents, as well as the broader educational community, of achievement at a certain point in time In order to celebrate success, plan interventions and support continued progress. Assessment must be planned with its purpose in mind. Assessment for, as and of learning all have a role to play in supporting and improving student learning, and must be appropriately balanced. The most important part of assessment is the interpretation and use of the information that is gleaned for its intended purpose. Assessment is embedded in the learning process. It is tightly interconnected with curriculum and instruction. As teachers and students
  • 6. work towards the achievement of curriculum outcomes, assessment plays a constant role in informing instruction, guiding the student’s next steps, and checking progress and achievement. Teachers use many different processes and strategies for classroom assessment, and adapt them to suit the assessment purpose and needs of individual student. Table 1.1 Classroom assessment: from … to … No From To 1 Classroom tests disconnected from the focus of instruction Classroom tests refecting the written and taught curriculum 2 Assessment using only selected respons formats Assessment method selected intentionally to reflect specific kinds of learning target 3 Mystery assessment, where students don’t know in advances what they are accountable for learning Transparency in assessments, where students know in advance what they will be held accountable for learning 4 All assessment and assignments, including practice, “count” toward the grade Some assessment an assignment “count” toward the grade, others are for practice or other formative use 5 Students as passive participant in the assessment process Students as active users of assessments as learning experiences 6 Students not finding out until the graded event what they are good at and what they need to work on Students being able to identify theirs strengths and areas for futher study during learning 1.2. General Type of Assessment 1.2.1. Summative assessment Summative assessment are cumulative evaluation used to measure student growth after instruction and are generally given at the end of a course in order to determine wheter long term learning goals have been met. Summative assessment is assessments that provide evidence of student achievement for the purpose of making a judgment about student competence or program effectiveness. Typically the summative evaluation concentrates on learner outcome rather than only the program of instruction. It is means to determine a student’s mastery and understanding of information, skills, concept and process. Summative assessment occur at the end of a formal learning experience. Either a class or a program and may include a variety of activities example test, demonstration, portofolios, internship, clinical, and capstone project. Summative assement is a high stakes type of assessment for the purpose of the making final judgment about student achivment and instructional effectiveness.
  • 7. By the time summative assessment occur student haved typically exit the learning mode. Teachers/schools can use these assessments to identify strengths and weaknesses of curriculum and instruction, with improvements affecting the next year's/term's students. Summative assessment are given periodically to determine at a particular point in time what students know and do not know. Many associate summative assessments only with standardized tests such as state assessments, but they are also used at and are an important part of district and classroom programs. Summative assessment at the district and classroom level is an accountability measure that is generally used as part of the grading process. The list is long, but here are some examples of summative assessments: a) State assessments b) District benchmark or interim assessments c) End-of-unit or chapter tests d) End-of-term or semester exams e) Scores that are used for accountability of schools (AYP) and students (report card grades). The key is to think of summative assessment as a means to gauge, at a particular point in time, student learning relative to content standards. Although the information gleaned from this type of assessment is important, it can only help in evaluating certain aspects of the learning process. Because they are spread out and occur after instruction every few weeks, months, or once a year, summative assessments are tools to help evaluate the effectiveness of programs, school improvement goals, alignment of curriculum, or student placement in specific programs. Summative assessments happen too far down the learning path to provide information at the classroom level and to make instructional adjustments and interventions during the learning process. It takes formative assessment to accomplish this. The goal of summative assessment is to evaluate student learning at the end of an instructional unit by comparing it against some standard or benchmark. Information from summative assessments can be used formatively when students or faculty use it to guide their efforts and activities in subsequent courses. 1.2.2. Formative assessment Formative Assessment is part of the instructional process. Formative assessment is an integral part of teaching and learning. Formative assessment ongoing assessments, reviews, and observations in a classroom. Teachers use formative assessment to improve instructional methods and student feedback throughout the teaching and learning process. For example, if a teacher observes that some students do not grasp a concept, she or he can design a review activity or use a different instructional strategy. Likewise, students can monitor their progress
  • 8. with periodic quizzes and performance tasks. The results of formative assessments are used to modify and validate instruction. Formative assessment occurs in the short term, as learners are in the process of making meaning of new content and of integrating it into what they already know. When in corporated into classroom practice, it providesthe information needed to adjust teaching and learning while they are happening. In this sense formative assessment informs both teachers and students about student understanding at a point when timely adjustment can be made. These adjustment help to ensure student achieve, targeted standards based learning goal within a set time frame. Although formative assessment strategies appear in a variety of formats, there are some distinct ways to distinguish them from summative assessments. Formative assessment helps teachers determine next steps during the learning process as the instruction approaches the summative assessment of student learning. Some of the instructional strategies that can be used formatively include the following: 1. Criteria and goal setting with students engages them in instruction and the learning process by creating clear expectations. In order to be successful, students need to understand and know the learning target/goal and the criteria for reaching it. Establishing and defining quality work together, asking students to participate in establishing norm behaviors for classroom culture, and determining what should be included in criteria for success are all examples of this strategy. Using student work, classroom tests, or exemplars of what is expected helps students understand where they are, where they need to be, and an effective process for getting there. 2. Observations go beyond walking around the room to see if students are on task or need clarification. Observations assist teachers in gathering evidence of student learning to inform instructional planning. This evidence can be recorded and used as feedback for students about their learning or as anecdotal data shared with them during conferences. 3. Questioning strategies should be embedded in lesson/unit planning. Asking better questions allows an opportunity for deeper thinking and provides teachers with significant insight into the degree and depth of understanding. Questions of this nature engage students in classroom dialogue that both uncovers and expands learning. An “exit slip” at the end of a class period to determine students’ understanding of the day’s lesson or quick checks during instruction such as “thumbs up/down” or “red/green” (stop/go) cards are also examples of questioning strategies that elicit immediate information about student learning. Helping students ask better questions is another aspect of this formative assessment strategy. 4. Self and peer assessment helps to create a learning community within a classroom. Students who can refect while engaged in metacognitive thinking are involved in their
  • 9. learning. When students have been involved in criteria and goal setting, self-evaluation is a logical step in the learning process. With peer evaluation, students see each other as resources for understanding and checking for quality work against previously established criteria. 5. Student record keeping helps students better understand their own learning as evidenced by their classroom work. This process of students keeping ongoing records of their work not only engages students, it also helps them, beyond a “grade,” to see where they started and the progress they are making toward the learning goal. All of these strategies are integral to the formative assessment process, and they have been suggested by models of effective middle school instruction. 6. Balancing Assessment. As teachers gather information/data about student learning, several categories may be included. In order to better understand student learning, teachers need to consider information about the products (paper or otherwise) students create and tests they take, observational notes, and reflections on the communication that occurs between teacher and student or among students. When a comprehensive assessment program at the classroom level balances formative and summative student learning/achievement information, a clear picture emerges of where a student is relative to learning targets and standards. Students should be able to articulate this shared information about their own learning. When this happens, student-led conferences, a formative assessment strategy, are valid. The more we know about individual students as they engage in the learning process, the better we can adjust instruction to ensure that all students continue to achieve by moving forward in their learning. The goal of formative assessment is to monitor student learning to provide ongoing feedback that can be used by instructors to improve their teaching and by students to improve their learning. More specifically, formative assessments: • help students identify their strengths and weaknesses and target areas that need work • help faculty recognize where students are struggling and address problems immediately Formative assessments are generally low stakes, which means that they have low or no point value. Examples of formative assessments include asking students to: • draw a concept map in class to represent their understanding of a topic • submit one or two sentences identifying the main point of a lecture • turn in a research proposal for early feedback
  • 10. 1.3. Norm Referenced Assessment and Criterion Referenced Assessment When we look at the types of assessment instruments, we can generally classify them into two main groups: Criterion referenced assessments and norm-referenced assessments. 1.3.1. Norm Referenced assessment Linn and Gronlund (2000) define norm referenced assessments in the following a test or other type of assessment designed to provide a measure of performance that is interpretable in terms of an individual's relative standing in some known group. Norm referenced tests allow us to compare a student's skills to others in his age group. Norm-referenced tests are developed by creating the test items and then administering the test to a group of students that will be used as the basis of comparison. The essential characteristic of norm referencing is that students are awarded their grades on the basis of their ranking within a particular cohort. Norm-referencing involves fitting a ranked list of students’ ‘raw scores’ to a pre-determined distribution for awarding grades. Usually, grades are spread to fit a ‘bell curve’ (a ‘normal distribution’ in statistical terminology), either by qualitative, informal rough-reckoning or by statistical techniques of varying complexity. For large student cohorts (such as in senior secondary education), statistical moderation processes are used to adjust or standardise student scores to fit a normal distribution. Norm referenced is standardized test compare students performance to that of a norming or sample group who are in the same grade or are the same age. Student performance is communicated in presentile ranks, grade aquivalent score, normal curve equivalents, scaled scores, or stanine scores. 1.3.2. Criterion Referenced Assessment Criterion referenced is a students performance is easured against a standard. One form of criterion referenced assessment is the benchmark, a description of a key task that students are expected be perform. In contrast, criterion referencing assessment as the name implies, involves determining a student’s grade by comparing his or her achievements with clearly stated criteria for learning outcomes and clearly stated standards for particular levels of performance. Linn and Gronlund (2000) define criterion referenced assessments in the following a test or other type of assessment designed to provide a measure of performance that is interpretable in terms of a clearly defined and delimited domain of learning tasks.
  • 11. Unlike norm-referencing, there is no pre-determined grade distribution to be generated and a student’s grades is in no way influenced by the performance of others. Theoretically, all students within a particular cohort could receive very high (or very low) grades depending solely on the levels of individuals’ performances against the established criteria and standards. The goal of criterion referencing is to report student achievement against objective reference points that are independent of the cohort being assessed. Criterion referencing can lead to simple pass fail grading schema, such as in determining fitness to practice in professional fields. Criterion referencing can also lead to reporting student achievement or progress on a series of key criteria rather than as a single grade or percentage. Criterion referencing is worth aspiring towards. Criterion referencing requires giving thought to expected learning outcomes: it is transparent for students, and the grades derived should be defensible in reasonably objective terms students should be able to trace their grades to the specifics of their performance on set tasks. Criterion referencing lays an important framework for student engagement with the learning process and its outcomes. The distinction between criterion and norm referenced assessments is criterion referencing compares one to a standard, norm referencing compares one to others. The following is a difference between norm referenced test and criterion referenced test adapted from Popham, (1975). Table 1.2. Difference Norm Referenced Test and Criterion Referenced Test Dimension Criterion Referenced Tests Norm Referenced Tests Purpose To determine whether each student has achieved specific skills or concepts. To find out how much students know before instruction begins and after it has finished. To rank each student with respect to the achievement of others in broad areas of knowledge. To discriminate between high and low achievers. Content Measures specific skills which make up a designated curriculum. These skills are identified by teachers and curriculum experts. Each skill is expressed as an instructional objective. Measures broad skill areas sampled from a variety of textbooks, syllabi, and the judgments of curriculum experts. Item Characteristics Each skill is tested by at least four items in order to obtain an adequate sample of student performance and to minimize the effect of guessing. Each skill is usually tested by less than four items. Items vary in difficulty.
  • 12. The items which test any given skill are parallel in difficulty. Items are selected that discriminate between high and low achievers. Score Interpretation Each individual is compared with a preset standard for acceptable achievement. The performance of other examinees is irrelevant. A student's score is usually expressed as a percentage. Student achievement is reported for individual skills. Each individual is compared with other examinees and assigned a score-- usually expressed as a percentile, a grade equivalent score, or a stanine. Student achievement is reported for broad skill areas, although some norm-referenced tests do report student achievement for individual skills. Which of these methods is preferable? Mostly, students’ grades in universities are decided on a mix of both methods, even though there may not be an explicit policy to do so. In fact, the two methods are somewhat interdependent, more so than the brief explanations above might suggest. Logically, norm-referencing must rely on some initial criterion-referencing, since students’ ‘raw’ scores must presumably be determined in the first instance by assessors who have some objective criteria in mind. Criterion-referencing, on the other hand, appears more educationally defensible. But criterion-referencing may be very difficult, if not impossible, to implement in a pure form in many disciplines. It is not always possible to be entirely objective and to comprehensively articulate criteria for learning outcomes: some subjectivity in setting and interpreting levels of achievement is inevitable in higher education. This being the case, sometimes the best we can hope for is to compare individuals’ achievements relative to their peers. Norm-referencing, on its own and if strictly and narrowly implemented is undoubtedly unfair. With norm-referencing, a student’s grade depends to some extent at least not only on his or her level of achievement, but also on the achievement of other students. This might lead to obvious inequities if applied without thought to any other considerations. For example, a student who fails in one year may well have passed in other years! The potential for unfairness of this kind is most likely in smaller student cohorts, where norm-referencing may force a spread of grades and exaggerate differences in achievement. Alternatively, norm-referencing might artificially compress the range of difference that actually exists. Recognising, however, that some degree of subjectivity is inevitable in higher education, it is also worthwhile to monitor grade distributions – in other words, to use a modest process of norm-referencing to watch the outcomes of a predominantly criterion-referenced grading model. In doing so, if it is believed too many students are receiving low grades, or too many students are receiving high grades, or the distribution is in some way oddly spread, then this might suggest something is amiss and the assessment process needs
  • 13. looking at. There may be, for instance, a problem with the overall degree of difficulty of the assessment tasks, for example: not enough challenging examination questions, or too few, or assignment tasks that fail to discriminate between students with differing levels of knowledge and skills. There might also be inconsistencies in the way different assessors are judging student work. Best practice in grading in higher education involves striking a balance between criterion referencing and norm-referencing. This balance should be strongly oriented towards criterion referencing as the primary and dominant principle. Reference: Bastanfar, A. 2009. Alternative in Assessment. Article. http://www3.telus.net/linguisticsissues/alternatives. Garrison, C. & Ehringhaus, M. 2010. Formative and Summative Assessment in The Classroom. www.measuredprogress. Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th ed.). Upper Saddle River, NJ: Prentice Hall. Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing 18 (4) 351–372. Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc.
  • 14. CHAPTER II EVALUATION OF LEARNING OBJECTS Purpose: After learning this matter, student are expected to: - Understanding what is measured in cognitive aspects - Understanding what is measured in afectif aspects - Understanding what is measured in psikomotor aspects - Be able to explain the kinds of scale assessment 2. 1. Cognitive Learning Outcomes One of the objects of evaluation result is the cognitive aspects of learning. The test questions will focus on appropriate intellectual activity ranging from simple recall to problem solving, crithical thinking and reasoning. Cognitive complexity refers to the various levels of learning that can be tested. A good test reflects the goals of the instruction. If the instructor is mainly concerned with students memorizing facts, the test should ask for simple recall of material. If the instructor is trying to develop analytic skills, a test that asks for recall is inappropriate and will cause students to conclude that memorization is the instructor's true goal. In 1956, after extensive research on educational goals, the group published its findings in a book edited by Dr. Benjamin S. Bloom, a Harvard professor. Bloom’s Taxonomy of Educational Objectives lists six levels of intellectual understanding: • Knowledge • analysis • Comprehension • syntesis • application • evaluation Table 2.1 Cognitive complexity adapated from Clay (2001)
  • 15. Steps Explanation Example Knowledge Recognizing and recalling information, including dates, events, persons, places; terms, definitions; facts, principles, theories; methods and procedures Who invented the…? What is meant by…? Where is the…? Comprehension Understanding the meaning of information, including restating (in own words); translating from one form to another; or interpreting, explaining, and summarizing. Restate in your own words…? Convert fractions into…? List three reasons for…? Application Applying general rules, methods, or principles to a new situation, including classifying something as a specific example of a general principle or using a formula to solve a problem. How is...an example of... ? How is...related to... ? Why is...significant? Analysis Identifying the organization and patterns within a system by identifying its component parts and the relationships among the components. What are the parts of... ? Classify ...according to... Outline/diagram... Synthesis Discovering/creating new connections, generalizations, patterns, or perspectives; combining ideas to form a new whole. What would you infer from... ? What ideas can you add to... ? How would you create a... ? Evaluation Using evidence and reasoned argument to judge how well a proposal would accomplish a particular purpose; resolving controversies or differences of opinion. Do you agree…? How would you decide about... ? What priority would you give... ? 2. 2. Affective Learning Outcomes Affective learning outcomes are learning outcomes related to their interests, attitudes and values. Affective learning outcomes developed by karthwohl, et al as outlined in his book: “Handbook II: The affective Domain”. According Karthwohl (in Mehren and Lehmann, 1973) affective domain consisit of: receiving, responding, valuting, organization, and Characteristing. Table 2.2: Affective domain guide adapted by Clay (2001)
  • 16. Level If the student must Then use these key words in objectives, assignments and evaluations Receiving …receive information about or give attention to this new attitude, value or belief.? • be alert to • be aware of • be sensitive to • experience • listen to • look at • perceive existence • Receive information on • take notes on • take notice of • willingly attends Responding …participate in, or react to this new attitude, value or belief in a positive manner. • allow other to • answer questions on • contribute to • cooperate with dialog on • discuss openly • enjoy doing • participate in • reply to • respect those who Valuing …show some definite involvement in or commitment to this new attitude, value or belief • accept as right • accept as true • affirm belief/trust in • associate himself with • assume as true • consider valuable • decide based on • indicate agreement • influence others • justify based on • seek out more detail Organizing …integrate this new attitude, value or belief, with the existing organization of attitudes, values and beliefs, so that it has a position of priority and advocacy. • Advocate • integrate into life • judge based on • place in value system • prioritize based on • persuade others • systematize Characteristi ng …fully internalize this new attitude, value or belief so that it consistently characterizes thought and action. • act based on • consistently carry out • consistently practice • fully internalize • know by others as • characterized by • sacrifice for • view life based on 2. 3. Psychomotor Learning Outcomes Psychomotor learning outcomes are learning outcomes related to motor skills and the ability to act individually. Pscychomotor behaviors are performed actions that are neuromuscular in nature and demand certain levels of physical dexterity. This assessment is suitable to assess the achievement of competence demanded of learners perform a specific task example: experiment in laboratorium. Taxonomy is often used is the taxonomy of the psychomotor learning outcomes Simpson (Gronlund and Linn, 1990. That taxonomi such as perception, set, guided response, mechanism, Complex Overt Response, adaptation, origination.
  • 17. Tabel 2.3 Psychomotor Domain Category Description Examples of activity Action verbs Perception Awareness, the ability to use sensory cues to guide physical activity. The ability to use sensory cues to guide motor activity. This ranges from sensory stimulation, through cue selection, to translation. use and/or selection of senses to absorb data for guiding movement Examples: Detects non-verbal communication cues. Estimate where a ball will land after it is thrown and then moving to the correct location to catch the ball. Adjusts heat of stove to correct temperature by smell and taste of food. Adjusts the height of the forks on a forklift by comparing where the forks are in relation to the pallet. “By the end of the music theatre program, students will be able to relate types of music to particular dance steps.” chooses, describes, detects, differentiates, distinguishes, feels, hears, identifies, isolates, notices, recognizes, relates, selects, separates, touches, Set Readiness, a learner's readiness to act. Readiness to act. It includes mental, physical, and emotional sets. These three sets are dispositions that predetermine a person’s response to different situations (sometimes called mindsets). mental, physical or emotional preparation before experience or task Examples: Knows and acts upon a sequence of steps in a manufacturing process. Recognize one’s abilities and limitations. Shows desire to learn a new process (motivation). NOTE: This subdivision of Psychomotor is closely related with the "Responding to phenomena" subdivision of the Affective domain. “By the end of the physical education program, students will be able to demonstrate the proper stance for batting a ball.” arranges, begins, displays, explains, gets set, moves, prepares, proceeds, reacts, shows, states, volunteers, responds, starts, Guided Response Attempt. The early stages in learning a complex skill that includes imitation imitate or follow instruction, trial and error. Examples: Performs a mathematical equation as assembles, builds, calibrates, constructs, copies, dismantles, displays, dissects,
  • 18. Category Description Examples of activity Action verbs and trial and error. Adequacy of performance is achieved by practicing. demonstrated. Follows instructions to build a model. Responds hand-signals of instructor while learning to operate a forklift. “By the end of the physical education program, students will be able to perform a golf swing as demonstrated by the instructor.” fastens, fixes, follows, grinds, heats, imitates, manipulates, measures, mends, mixes, reacts, reproduces, responds sketches, traces, tries. Mechanism basic proficiency, the ability to perform a complex motor skill. This is the intermediate stage in learning a complex skill. Learned responses have become habitual and the movements can be performed with some confidence and proficiency. competently respond to stimulus for action Examples: Use a personal computer. Repair a leaking faucet. Drive a car. “By the end of the biology program, students will be able to assemble laboratory equipment appropriate for experiments.” assembles, builds, calibrates, completes, constructs, dismantles, displays, fastens, fixes, grinds, heats, makes, manipulates, measures, mends, mixes, organizes, performs, shapes, sketches. Complex Overt Response expert proficiency, the intermediate stage of learning a complex skill. The skillful performance of motor acts that involve complex movement patterns. Proficiency is indicated by a quick, accurate, and highly coordinated performance, requiring a minimum of energy. This category includes performing without hesitation, and automatic performance. For example, players Execute a complex process with expertise Examples: Maneuvers a car into a tight parallel parking spot. Operates a computer quickly and accurately. Displays competence while playing the piano. “By the end of the industrial education program, students will be able to demonstrate proper use of woodworking tools to high school students.” assembles, builds, calibrates, constructs, coordinates, demonstrates, dismantles, displays, dissects, fastens, fixes, grinds, heats, manipulates, measures, mends, mixes, organizes, sketches. NOTE: The key words are the same as Mechanism, but will have adverbs or adjectives that indicate that the performance is quicker, better, more accurate, etc.
  • 19. Category Description Examples of activity Action verbs are often utter sounds of satisfaction or expletives as soon as they hit a tennis ball or throw a football, because they can tell by the feel of the act what the result will produce. Adaptation adaptable proficiency, a learner's ability to modify motor skills to fit a new situation. Skills are well developed and the individual can modify movement patterns to fit special requirements. alter response to reliably meet varying challenges Examples: Responds effectively to unexpected experiences. Modifies instruction to meet the needs of the learners. Perform a task with a machine that it was not originally intended to do (machine is not damaged and there is no danger in performing the new task). “By the end of the industrial education program, students will be able to adapt their lessons on woodworking skills for disabled students.” adapts, adjusts, alters, changes, integrates, rearranges, reorganizes, revises, solves, varies. Origination creative proficiency, a learner's ability to create new movement patterns. Creating new movement patterns to fit a particular situation or specific problem. Learning outcomes emphasize creativity based upon highly developed skills. develop and execute new integrated responses and activities Examples: Constructs a new theory. Develops a new and comprehensive training programming. Creates a new gymnastic routine. arranges, builds, combines, composes, constructs, creates, designs, formulates, initiate, makes, modifies, originates, re-designs, trouble-shoots. 2. 4. The Type of Value Scale
  • 20. Scales of measurement refer to ways in which variables/numbers are defined and categorized. Each scale of measurement has certain properties which in turn determines the appropriateness for use of certain statistical analyses. There are four measurement scales (or types of data) such as nominal, ordinal, interval and ratio. 2.4.1. Nominal value scale Nominal value scale is a scale that used to identifying object, individual, or group. In the quisioner that gave yes (1) answer or no (0) is the sample nominal value scale. The least like real number. Nominal basically refers to category discrete data such as name of your school, type of car you drive, classifying the gender, religion, menu items selected, etc. What is your gender? M – Male F - Female Which recreational activities do you participat in? 1 – Hiking 2 – Fishing 3 – Boating 4 – swimming 5 - picniking a sub-type of nominal scale with only two categories (e.g. male/female) is called “dichotomous.” Nominal data can be clearly described in pie charts because they include clear categories that sup-up to 100%. 2.4.2. Ordinal values scale Ordinal value scale is a scale that have a rank form. Ordinal is Scale for ordering observations from low to high with any ties attributed to lack of measurement sensitivity e.g. score from a questionnaire. For the example the first rank, second, and soon. In the questioner that have a likerts scale, use the ordinal value scale such as a disagree statement (1), doubt statement (2), and an agree statement (3). Ordinal refer to quantities that have a natural ordering. The ranking of favorit sports, the order of people placein a line, the order of runners finishing a race or more often the choice on a rating scale from 1 to 5. For the example: class ranks, social class categories, etc. Example: How statisfied are you with our service? 1. very unsatisfied 2. unsatisfied 3. neutral 4. unsatisfied 5. very unsatisfied How do you feel today? 1. very unhappy 2. unhappy 3. ok 4. happy 5. very happy What is your hair colour? 1 – Black 2 – brown 3 – blonde 4 – gray 5 - other
  • 21. 2.4.3. Interval values scale Internal value scale is a same scale with nominal and and ordinal values scale, but it has a remain characteristics and be able to notate into the mathematics function. Scale with a fixed and defined interval. For the example how much a woman go to market (one, twice, etc) or final test score. Interval data is like ordinal except we can say the intervals between each value are equally split. The most common example is the temperature in degrees Fahrenheit. The difference between 29 and 30 degrees is the same magnitude as the difference between 78 and 79. 2.4.4. Ratio value scale Ratio value scale is a real value scale, have a same distance and be able to notate into the mathematics function. Ratio scales are the easiest to understand Because they are numbers as we usually think of them. The distance between adjacent numbers are equal on a ratio scale and the score of zero on the scale ratio means that there is none of whatever is being Measured. Most ratio scales are counts of things. Ratio data is interval data with a natural zero point. The example, weight the distance of street, time to complete a task, size of an object, etc. Reference : Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). Kansas State Department of Education Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. Inc. Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.
  • 22. CHAPTER III TEST ASSESSMENT Purpose: After learning this matter, students are expected to: - be able to explain the test assessment - be able to describe kinds of test assessment - understanding advantages and disadvantages using the test Decision making Non testing Non tests measurement quantitative assessment by description Figure 3.1 alternative assessment, decision making in educational setting As Figure 3.1 shows, tests constitute only a small set of options, among a wide range of other options, for a language teacher to make decisions about students. The judgment emanating from a test is not necessarily more valid or reliable from the one deriving from qualitative procedures since both should meet reliability or validity criteria to be considered as informed decisions. The area circumscribed within quantitative decision-making is relatively
  • 23. small and represents a specific choice made by the teacher at a particular time in the course while the vast area outside which covers all non-measurement qualitative assessment procedures represents the wider range of procedures and their general nature. This means that the qualitative approaches which result in descriptions of individuals, as contrasted to quantitative approaches which result in numbers, can go hand in hand with the teaching and learning experiences in the class and they can reveal more subtle shades of students’ proficiency. Test is method of measuring a persons ability, knowledge or performance to complete certain tasks or demonstrate mastery of a skill or knowledge of content. Test is a systematic procedure for observing persons and describing them with either a numerical scale or a category system. Thus test may give either a qualitative or quantitative information. Two type of test are objective test and essay test. Essay tests are appropriate when: • The group to be tested is small and the test is not to be reused. • You wish to encourage and reward the development of student skill in writing. • You are more interested in exploring the student’s attitudes than in measuring his/her achievement. Objective tests are appropriate when: • The group to be tested is large and the test may be reused. • Highly reliable scores must be obtained as efficiently as possible. • Impartiality of evaluation, fairness, and freedom from possible test scoring influences are essential. Either essay or objective tests can be used to 1) measure almost any important educational achievement, 2) a written test can measure, 3) test understanding and ability to apply principles, 4) test ability to think critically, 5) test ability to solve problems. 3.1. Objective Objective tests measure both your ability to remember facts and figures and your understanding of course materials. These tests are often designed to make you think independently, so don't count on recognizing the right answer. Instead, prepare yourself for high level critical reasoning and making fine discriminations to determine the best answer. Taking an objective examination is somewhat different from taking an essay examination. The objective examination may be composed of true false, multiple choice, or matching responses. Also included occasionally is a fill in section. There are certain things that you must remember to do as you take this kind of test. First, roughly decide how to divide your
  • 24. time. Quickly glance over the pages to see how many kinds of questions are being used and how many there are of each kind. Secondly, carefully read the instructions and make sure that you understand them before you begin to work. Indicate your answers exactly as specified in the instructions. If your instructor has not indicated whether there is a penalty for guessing, ask him or her about it; then, if there is a penalty, do not guess. 3.1.1. Multiple choice Multiple choice is a test that has items formatted as multiple choice question, and the candidat must choose which answer or group of answers is correct. The multiple choice question consists of two parts: 1) the stem the statement or question, which identifies the question or problem and 2) the choices also known as the distracters. Usually, students are asked to select the one alternative that best completes a statement or answers a question. Multiple choice items can also provide an excellent basis for post test discussion, especially if the discussion addresses why the incorrect responses were wrong as well as why the correct responses were right. Unfortunately, multiple choice items are difficult and time consuming to construct well. They may also appear too discriminating (picky) to students, especially when the alternatives are well constructed and are open to misinterpretation by students who read more into questions than is there. Multiple choice tests can be used to test the ability to: 1. Recall memorized information 2. Apply theory to routine cases 3. Apply theory to novel situations 4. Use judgment in analyzing and evaluating Example of multiple choice: A three years old child can usually be expected to: a. Cry when separated from his or her mother b. Have imaginary friends c. Play with other children of the same age d. Constantly argue with order siblings 3.1.2. True/false questions True/false question present candidates with a binary choice a statement is either true or false. This method presents problems as depending on the number of questions. True/false questions also a popular question type, the true false question has only two options. True false questions usually state the relation of two things to one another. Because the instructor is interested in knowing whether you know when and under what circumstances something is or
  • 25. is not true, s/he usually includes some qualifiers in the statement. The qualifiers must be carefully considered. With the following qualifiers, you are wiser to guess "yes" if you don't know the answer because you may stand some chance of getting the answer right: most, some, usually, sometimes, and great. On the other hand, with these next qualifiers, you should guess "no" unless you are certain that the statement is true: all, no, always, is, never, is not, good, bad, equal, less. The following are advantages of true or false test: • Can test large amounts of content • Students can answer 3-4 questions per minute And the disadvantages are: • They are easy • It is difficult to discriminate between students that know the material and students who do not • Students have a 50-50 chance of getting the right answer by guessing • Need a large number of items for high reliability Example of true or false question: 1. Electrons are larger than molecules. a. True b. false 2. True or false? The study of plants is known as botany. a. True b. false 3. TTrue or false? Is it recommended to take statements directly from the text to make good true-false questions? a. True b. false 3.1.3. Matching Questions Type Matching question type is an items that provides a define term and requires a test taker to match identifying characteristic to the correct term. Matching questions give students some opportunity for guessing. Student must know the information well in that you are presented with two columns of items for which student must establish relationships. If only one match is allowed per item then once items become eliminated, a few of the latter ones may be guessed. Matching questions give stundent some opportunity for guessing. Student must know the information well in that you are presented with two columns of items for which students must establish relationships. If only one match is allowed per item then once items become eliminated, a few of the latter ones may be guessed. A simple matching item consists of two
  • 26. columns: one column of stems or problems to be answered, and another column of responses from which the answers are to be chosen. Traditionally, the column of stems is placed on the left and the column of responses is placed on the right. Example: Directions: match the following! Water A. NaCl Discovered radium B. H2O Salt C. Fermi Ammonia D. NH3 E. Curie 3.1.4. Completions type Completions type is a filling in the blank item provides a test taker with identifying characteristic and requires the test taker to recall the correct term. Completion items are especially useful in assessing mastery of factual information when a specific word or phrase is important to know. There are two type of completion type such us the easier version provides a word bank of possible word that will fill in the blank. For some exams all words in the word bank are exactly once. If a teacher wanted to create a test of medium difficulty, they would provide a test with a word bank, but some words maybe used more than onces and others not at all. The hardest variety of such a test is a fill in the blank test in which no word bank is provided at all. This generally requires a higer level of understanding and memory than a multiple choice. Advantages: • Good for who, what, where, when content • Minimizes guessing • Encourages more intensive study. Student must know the answer vs. recognizing the answer. • Can usually provide an objective measure of student achievement or ability Disadvantages: • Difficult to assess higher levels of learning because the answers to completion items are usually limited to a few words • Difficult to construct so that the desired response is clearly indicated • May overemphasize memorization of facts • Questions may have more than one correct answer • Scoring is time consuming
  • 27. A completion item requires the student to answer a question or to finish an incomplete statement by filling in a blank with the correct word or phrase. For example, A subatomic particle with a negative electric charge is called a(n) ____________. 3.2. Essay Essay test is a test that requires the student to compose responses, usually lengthy up to several paragraphs. Essay test measure higer level thinking. A typical essay test usually consists of a small number of questions to which the student is expected to recall and organize knowledge in logical, integrated answers. Questions that test higher level processes such as: analysis, synthesis, evaluation, creativity. The distinctive feature of essay type tets is the freedom of response. Pupil are free to select, relates and present ideas in their own words. Items such us shorts answer or essay typically require a test taker to write a response to fulfill the requirenments of the item. In administrative term, essay items take less time to construct. As an assessment tool, essay items can test complex learning objectives as well as processes used to anser the questions. The items can also provide more realistic and generalize task for test. Finally, these items make it difficult for test takers to guess the correct answers and require test takers to demonstrate their writing skills as well as correct spelling and grammar. Uses of essay test: a. Assess the ability to recall, organize, and integrate ideas. b. Assess the ability to express one self in writing. c. Ability to supply information. d. Assess student understanding of subject matter. e. Measure the knowledge of factual information. The main advantages of essay and short answer items are that they permit students to demonstrate achievement of such higher level objectives as analyzing and critical thinking. Written items offer students the opportunity to use their own judgment, writing styles, and vocabularies. They are less time consuming to prepare than any other item type. Research indicates that students study more efficiently for essay type examinations than for selection (multiple choice) tests. Students preparing for essay tests focus on broad issues, general concepts, and interrelationships rather than on specific details. This studying results in somewhat better student performance regardless of the type of exam they are given. Essay
  • 28. tests also give the instructor an opportunity to comment on students' progress, the quality of their thinking, the depth of their understanding, and the difficulties they may be having. The following are the advantages essay test: • Students less likely to guess • Easy to construct • Stimulates more study • Allows students to demonstrate ability to organize knowledge, express opinions, show originality. Disadvantages: • Can limit amount of material tested, therefore has decreased validity. • Subjective, potentially unreliable scoring. • Time consuming to score. Types of essay test: 3.2.1. Restricted response The restricted response question usually limits both the content and the response the content is usually restricted by the scope of the topic to be discussed limitations on the form of response are generally indicated in the question another way of restricting responses in essay tests is to base the questions on specific problems. For this purpose, introductory material like that used in interpretive exercises can be presented. Such items differ from objective interpretive exercise only by the fact that essay questions are used instead of multiple choice or true or false items. Because the restricted response question is more structured it is most useful for measuring learning outcomes requiring the interpretation and application of date in a specific area. Example of restricted response: describe two situations that demonstrate the application of the law of supply and demand, state any five definition of education! Advantages of restricted response questions: • restricted response questions more structured • measure specific learning outcomes • provide more for more ease of assessment • any outcomes measured by an objective interpretive exercise can be measured by a restricted response questions 3.2.2. Extended response
  • 29. Extended response question allows student to select information that they think is pertinent, to organize the answer in accordance with their best judgment and to integrate and evaluate ideas as they think suitable. No restriction is placed in students as to the points he will discuss and the type of organization he will use. They do not set limits on the length or exact content to be discussed. Teachers in such a way so as to give students the maximum possible freedom to determine the nature and scope of question and in a way he would give response of course being related topic and in stipulated time frame these types of questions. The student may be select the points he thinks are most important, pertinent and relevant to his points and arrangement and organize the answers in whichever way he wishes. So they are also called free response questions. This enables the teacher to judge the student’s abilities to organize, integrate, interpret the material and express themselves in their own words. It also gives an opportunity to comment or look into students’ progress, quality of their thinking, the depth of their understanding problem solving skills and the difficulties they may be having. These skills interact with each other with the knowledge and understanding the problem requires. Thus it is at the levels of synthesis and evaluation of writing skills that this type of questions makes the greatest contribution. Example: 1) describe at length the defects of the present day examination system in the state of Maharashtra. Suggest ways and means of improving the examination system. 2) describe the character of hamlet. 3) global warming is the next step to disaster. Reference : Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). Kansas State Department of Education Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological Bulletin. Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. Inc. Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.
  • 30. CHAPTER IV NON TEST ASSESSMENT Purpose: After learning this matter, students are expected to: - be able explain the non test assessment - be able to describe kinds of non test assessment Non test is an instrument other than academic achievement test. Item writing procedures for non-test instruments is the same as the procedure of writing tespada learning achievement test. Construct the lattice test, write items according to the lattice, review, validation grains, grain testing, grain refinement based on the results of trials. 4.1. Observation Should follow an established plan or checklist organized around concrete, objective data. Observation needs to be tied to the objectives of the course. By observation Teachers can assess their students' abilities simply by observing their classroom behavior or completion of activities. By watching students as they work, teachers can identify signs of struggle and determine where a child may be experiencing academic difficulties. Because students often do not realize that they are being observed, teachers can ensure that the picture they receive of student understanding represents the student's actual abilities. For most practitioners observation is a feature of everyday working life and practitioners can often be found with a notebook and pen close to hand to jot down unplanned observations that can be added to normal recording systems at a later time. However, as previously discussed, specific observations should be planned. Prior to beginning the observation practitioners should work through the stages outlined in the previous section and, as a part of this process, the most appropriate observational method should be selected from the range available. It will also be helpful to produce a cover sheet including such details as:
  • 31. • child’s name • child’s age • date • name of observer • the specific setting or area of setting • permissions gained • aims and purpose of observation • start and finish times. 4.2. Interview Interviews are the most frequently used method of personnel selection, but also are used for school admissions, promotions, scholarships, and other awards. Interviews vary in their content and structure. In a structured interview, questions are prepared before the interview starts. An unstructured interview simply represents a free conversation between an interviewer and interviewee, giving the interviewer the freedom to adaptively or intuitively switch topics. Research has shown that unstructured interviews lack predictive validity53 or show lower predictive validity than structured interviews. The best practices for conducting interviews are: • High degree of structure • Selection of questions according to job requirements • Assessment of aspects that cannot be better assessed with other methods • Scoring with pre-tested, behavior-anchored rating scales • Empirical examination of each question • Rating only after the interview • Standardized scoring • Training of interviewers Structured interviews can be divided into three types: a. Behavioral description interview involves questions that refer to past behavior in real situations, also referred to as job-related interview. b. Situational interview uses questions that require interviewees to imagine hypothetical situations (derived from critical incidents) and state how they would act in such situations. c. Multimodal interview combines the two approaches above and adds unstructured parts to ensure high respondent acceptance.
  • 32. Analyses of predictive validity of interviews for job performance have shown that they are good predictors of job performance, add incremental validity above and beyond general mental ability, and that behavioral description interviews show a higher validity than situational interviews. Interviews are less predictive of academic performance as compared to job-related outcomes. Predictive validity probably also depends on the content of the interview, but the analyses aggregated interviews with different contents. 4.3. Questionnaire Questionnaires are the most commonly used method for collecting information from program participants when evaluating educational and extension programs. There are nine steps involved in the development of a questionnaire: 1. Decide the information required. 2. Define the target respondents. 3. Choose the method(s) of reaching your target respondents. 4. Decide on question content. 5. Develop the question wording. 6. Put questions into a meaningful order and format. 7. Check the length of the questionnaire. 8. Pre-test the questionnaire. 9. Develop the final survey form. 4.4. Portofolios Portofolios is a collection of student work with a common theme or purpose. Like a photographers portfolio they should contain the best examples of all of their work. For subjects that are paper-based, the collection of a portfolio is simple. Homework is a structured practiced exercise that usually place a part in grading. Sometimes instructors assign reading or other homework which covers the theoretical aspects the subject matter, so that the class time can be used for more hands on practical work. In a portfolio assessment, a teacher looks not at one piece of work as a measure of student understanding, but instead at the body of work the student has produced over a period of time. To allow for a portfolio assessment, a teacher must compile student work throughout the term. This is commonly accomplished by providing each student with a folder in which to store essays or other large activities. Upon
  • 33. compilation of the portfolio, the teacher can review the body of work and determine the degree to which the work indicates the student's understanding of the content. Advantages of Portfolio Assessment • Assesses what students can do and not just what they know. • Engages students actively. • Fosters student-teacher communication and depth of exploration. • Enhances understanding of the educational process among parents and in the community. • Provides goals for student learning. • Offers an alternative to traditional tests for students with special needs. The use of the portfolio as an assessment tool is a process with multiple steps. The process takes time, and all of the component parts must be in place before the assessment can be utilized effectively. a. Decide on a purpose or theme. General assessment alone is not a sufficient goal for a portfolio. It must be decided specifically what is to be assessed. Portfolios are most useful for addressing the student’s ability to apply what has been learned. Therefore, a useful question to consider is, What skills or techniques do I want the students to learn to apply? The answer to this question can often be found in the school curriculum. b. Consider what samples. Consider what samples of student work might best illustrate the application of the standard or educational goal in question. Written work samples, of course, come to mind. However, videotapes, pictures of products or activities, and testimonials are only a few of the many different ways to document achievement. c. Determine how samples will be selected. A range of procedures can be utilized here. Students, maybe in conjunction with parents and teachers, might select work to be included, or a specific type of sample might be required by the teacher, the school, or the school system. d. Decide whether to assess the process and the product or the product only. Assessing the process would require some documentation regarding how the learner developed the product. For example, did the student use the process for planning a short story or utilizing the experimental method that was taught in class? Was it used correctly? Evaluation of the process will require a procedure for accurately documenting the process used. The documentation could include a log or video of the steps or an interview with the student. Usually, if both the process and the product are to be evaluated, a separate scoring system will have to be developed for each.
  • 34. e. Develop an appropriate scoring system. Usually this is best done through the use of a rubric, a point scale with descriptors that explain how the work will be evaluated. Points are allotted with the highest quality work getting the most points. If the descriptors are clear and specific, they become goals for which the student can aim. There should be a separate scale for each standard being evaluated. For example, if one standard being assessed is the use of grammatically correct sentence structure, five points might be allotted if all sentences are grammatically correct. Then, a specific number of errors would be identified for all other points with zero points given if there are more than a certain number of errors. It is important that the standards for evaluation be carefully explained. If we evaluate for clarity of writing, then an operational description of what is meant by clarity should be provided. Points available should be small enough to be practical and meaningful; an allotment of 20 points for clarity is not workable because an evaluator cannot really distinguish between a 17- and an 18-point product with regard to clarity. f. Share the scoring system with the students. Qualitative descriptors of how the student will be evaluated, known in advance, can guide learning and performance. g. Engage the learner in a discussion of the product. Through the process of discussion the teacher and the learner can explore the material in more depth, exchange feelings and attitudes with regard to the product and the learning process, and reap the greatest advantage of effective portfolio implementation. 4.5. Case studies and problem solving assignment can be used to apply knowledge. This type of assignment required the student to place him or herself in or react to a situation where their prior learning is needed to solve the problem or evaluate the situation. Cas studies should be realistic and practical with clear instruction. 4.6. Project Project are usually designed so that the students can apply many of the skills they have developed in the course by producing a product of some kind. Usually project assignments are given early in the course with a completion date toward the end of the quarter. By asking students to complete a project, teachers can see how well their pupils can apply taught information. Successful completion of a project requires a student to translate their learning into the completion of a task. Project-based assessment more closely approximates how students will be assessed in the real world, as employers will not ask their employees to take
  • 35. tests, but instead judge their merit upon the work they complete. Project is the example of performance task. Reference : Arvey, R. D., & Campion, J. E. (1982). The employment interview: A summary and review of recent research. Personnel Psychology. Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). Kansas State Department of Education Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological Bulletin. Damiani, V. B. 2004. Portofolio Asssessment in The Classroom. National Association of School Psycologists. Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co Janz, T., Hellervik, L., & Gilmore, D. C. (1986). Behavior Description Interviewing (BDI). Boston: Allyn & Bacon. Latham, G. P., Saari, L. M., Pursell, E. D., & Campion, M. A. (1980). The situational interview (SI). Journal of Applied Psychology. Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. Inc. Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta. Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin. Schuler, H. (2002). Das Einstellungsinterview (Multimodales Interview [MMI]). Göttingen: Hogrefe.
  • 36. CAPTER 5 VALIDITY TEST Purpose: - Be able to explain the definition of validity test - Be able to explain the function of validity test - Be able to test of validity Test validity is the extent to which a test (such as a chemical, physical, or scholastic test) accurately measures what it purports to measure. Validity divided into various kinds such as content validity, criterion validity, and construct validity. 5. 1. Content Validity Content validity is the estimate of how much a measure represents every single element of a construct. Content validity is The extent to which the content of the test matches the instructional objectives. The example A semester or quarter exam that only includes content covered during the last six weeks is not a valid measure of the course's overall objectives. It has very low content validity. 5. 2. Criterion Validity Criterion Validity assesses whether a test reflects a certain set of abilities. If the criterion is obtained some titTIC after the test is given, he is studying predictive validity. If
  • 37. the test score and criterion score are detertnined at essentially the sanle time, he is studying concurrent validity. Concurrent validity measures the test against a benchmark test and high correlation indicates that the test has strong criterion validity. In concurrent validity, we assess the operationalization's ability to distinguish between groups that it should theoretically be able to distinguish between. For example, if we come up with a way of assessing manic-depression, our measure should be able to distinguish between people who are diagnosed manic-depression and those diagnosed paranoid schizophrenic. If we want to assess the concurrent validity of a new measure of empowerment, we might give the measure to both migrant farm workers and to the farm owners, theorizing that our measure should show that the farm owners are higher in empowerment. As in any discriminating test, the results are more powerful if you are able to show that you can discriminate between two groups that are very similar. If the end-of-year math tests in 4th grade correlate highly with the statewide math tests, they would have high concurrent validity. Predictive validity is a measure of how well a test predicts abilities. It involves testing a group of subjects for a certain construct and then comparing them with results obtained at some point in the future. In predictive validity, we assess the operationalization's ability to predict something it should theoretically be able to predict. For instance, we might theorize that a measure of math ability should be able to predict how well a person will do in an engineering-based profession. We could give our measure to experienced engineers and see if there is a high correlation between scores on the measure and their salaries as engineers. A high correlation would provide evidence for predictive validity -- it would show that our measure can correctly predict something that we theoretically think it should be able to predict. 5. 3. Construct Validity Construct validity is an assessment of how well you translated your ideas or theories into actual programs or measures. Construct validity defines how well a test or experiment measures up to its claims. A test designed to measure depression must only measure that particular construct, not closely related ideals such as anxiety or stress. Construct validity refers to the degree to the which inferences can legitimately be made from the operationalizations in your study to the theoretical constructs on roomates Reviews those operationalizations were based. Like external validity, construct validity is related to generalizing . But , where external validity Involves generalizing from context to guide the
  • 38. study of people, places or times, the construct validity Involves generalizing from your program or measures to the concept of your program or measures. Convergent validity tests that constructs that are expected to be related are, in fact, related. In convergent validity, we examine the degree to which the operationalization is similar to (converges on) other operationalizations that it theoretically should be similar to. For instance, to show the convergent validity of a Head Start program, we might gather evidence that shows that the program is similar to other Head Start programs. Or, to show the convergent validity of a test of arithmetic skills, we might correlate the scores on our test with scores on other tests that purport to measure basic math ability, where high correlations would be evidence of convergent validity. Discriminant validity tests that constructs that should have no relationship do, in fact, not have any relationship. (also referred to as divergent validity). In discriminant validity, we examine the degree to which the operationalization is not similar to (diverges from) other operationalizations that it theoretically should be not be similar to. For instance, to show the discriminant validity of a Head Start program, we might gather evidence that shows that the program is not similar to other early childhood programs that don't label themselves as Head Start programs. Or, to show the discriminant validity of a test of arithmetic skills, we might correlate the scores on our test with scores on tests that of verbal ability, where low correlations would be evidence of discriminant validity. Reference : Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). Kansas State Department of Education Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological Bulletin. Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. Inc. Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta.
  • 39. CHAPTER VI RELIABILITY TEST Purpose: - Be able to explain the definition of reliability test - Be able to explain the function of reliability test - Be able to test of reliability Reliability relates to the consistency of an assessment. Reliability is a necessary but not sufficient condition for validity. For instance, if the needle of the scale is five pounds away from zero I always over report my weight by five pounds. The measurement consistent but it is consistenly wrong, the measurement not valid. A reliable assessment is one that consistently achieves the same results with the same (or similar) cohort of students. Various factors affect reliability including ambiguous questions, too many options within a question paper, vague marking instructions and poorly trained markers. Reliability testing methods can be divided into two as external consistency and as internal consistency. 6.1. External Consistency Reliability Reliability as an external consistency considers that the test said to be reliable if after having tested several times will give relatively consistent results. Test methods included in this method is the re-test method and parallel method.
  • 40. Table 6.1 Test Re-test and Parallel Forms No Method Prosedure Technic 1 Test Re-test The same tests were given as much as two times to the same students in different time Correlation product moment (between skor test 1 and test 2) 2 Parallel Forms Two similar tests / parallel given to the same group of learners Correlation product moment (between skor instrument test 1 dan instrument test 2) 6.1.1. Test Re-test Reliability Test Re-test reliability used to assess the consistency of a measure from one time to another. Technique to measure the reliability of an achievement test by testing the same achievement test repeatedly. The weakness of this method is that if the time interval is too short then the second test enable learners still remember material diteskan so it is possible that a second test result is better than the results of the first test. The reliability coefficient in this case is simply the correlation between the scores obtained by the same persons on the two administrations of the test. If the first test result has parallels with the results of the second test, the test is said to be reliable. The analysis is done by looking for correlations between the results of the first test and second test results. This is done using the Pearson product-moment correlation coefficient (r). The value of "r" will always fall within the range –1 to +1. Example : No Students name Score test 1 (X) Score test 2 (Y) 1 Agustina 78 80 2 Feby 80 85 3 Antoni 77 80 4 Chandra 90 85 5 Dionisius 70 75 6 Fitriani 73 78 etc The formula: Σ − Σ Σ ( )( ) N XY X Y { N X 2 ( X ) 2}{ N Y 2 ( Y ) 2} rXY Σ − Σ Σ − Σ =
  • 41. Description: N = number of students X = score test 1 Y = score test 2 6.1.2. Parallel Forms Reliability Parallel form reliability used to assess the consistency of the results of two tests constructed in the same way from the same content domain. This method requires the presence of two series of questions that have the same goals, level of difficulty, as well as composition of matter, but because of different grains, in other words, two tests must be parallel. Reliability coefficient obtained by correlating the results of the first test and second test results. Example : No Students name Result of Instrument 1 (X) Result of Instrument 2 (Y) 1 Fransiska 78 80 2 Johnson 80 85 3 Leona 77 80 4 Ratya 90 85 5 Febriyanti 70 75 6 Karmila 73 78 etc The formula: Σ − Σ Σ ( )( ) N XY X Y { N Σ X 2 − ( Σ X ) 2}{ N Σ Y 2 − ( Σ Y ) 2} rXY = Description: N = number of students X = score from result of instrument test 1 Y = score from result of instrument test 2
  • 42. 6.2. Internal Consistency Reliability Reliability as an internal consistency of the view that the test said to be reliable if the test item between consistent measurement results. Test-retest method and parallel form reliability methods have the disadvantage that they are time consuming. In most cases the researcher wants to estimate the reliability from a single administration of a test. This requirement has led to the measuring of internal consistency, or homogeneity. Internal consistency measures consistency within the tool. Several internal consistency methods exist. All internal consistency measurements have one thing in common, namely that the measurement is based on the results of a single measurement. In the present study Split-Half technique and Cronbach's Alpha method were used to estimate the internal consistency reliability. The statistical analysis for Split half reliability (Spearman and Brown formula and Guttmann's formula) and Cronbach's Alpha reliability, SPSS 17 Statistical Software was used. The calculation for the Split half reliability by Flanagan's formula MS-Excel software was used. 6.2.1. Split-Half reliability method In the Split-Half reliability method, the inventory was first divided into two equivalent halves and the correlation coefficient between scores of these half-test was found. This correlation coefficient denotes the reliability of the half test. The self correlation coefficient of the whole test is estimated by different formulas. The measuring instrument can be divided into two halves in a number of ways. But the best way to divide the measuring instrument into two halves is to find the correlation coefficient between scores of odd numbered and even numbered items. In the present study the correlation coefficient was calculated by using following formulas: a. Spearman and Brown Formula The spearman and Brown formula was designed to estimate the reliability of a test n times as long as the one for which we know a self correlation. From the reliability of the half test, the self-correlation coefficient of the whole test is estimated by the following Spearman and Brown formula: Where,
  • 43. rtt = reliability of a total test estimated from reliability of one of its halves (reliability coefficient of the whole test) rhh = self correlation of a half test (reliability coefficient of the half test) b. Rulon/Guttmann's Formula An alternate method for finding split-half reliability was developed by Rulon. It requires only the variance of the differences between each person's scores on the two half-tests and the variance of total scores. These two values are substituted in the following formula, which yields the reliability of the whole test directly: Where, rtt = Reliability of the test SDd = SD of difference of the scores SDx = SD of the scores of whole test c. Flanagan Formula Flanagan gave a parallel formula for finding reliability using split half method. Flanagan's Formula for reliability is described below: Where, rtt = Reliability of the test SD1 = SD of the scores on 1st half SD2 = SD of scores on 2nd half SDt = SD of scores of whole test 6.2.2. Cronbach's Alpha method Cronbach's Alpha is mathematically equivalent to the average of all possible split-half estimates. A statistical analysis computer programme SPSSS 17 was used to calculate the Cronbach's Alpha (a).
  • 44. Reference : Clay, B. 2001. Is This A Trick Question? (A short Guide to Writing Effective Test Question). Kansas State Department of Education Cronbach, L. J., & Meelh, P. E. 1955. Construct Validity in Psychological Test. Psycological Bulletin. Garrison, C. & Ehringhaus, M. 1995. Formative and Sumative Assessment in The Classroom. Gronlund, N. E. (1981). Measurement and Evaluation. New York: Mc Millan Publishing Co Popham, W. J. 1981. Modern Educational Measurement. Englewood Cliffs, NJ. Prentice Hall. Inc. Purwanto. 2008. Evaluasi Hasil Belajar. Pustaka Pelajar: Surakarta. Curiculum vitae Kadek Ayu Astiti, S. Pd., M.Pd. born in Singaraja, September 28, 1988. She is the second child of the couple and Ni Ketut Sudi Made Suarsini. Website address: www.kadekayuastiti.blogspot.com . History of education: elementary school No. 6 Kampung Baru Singaraja-Bali, SMP Negeri 3 Singaraja-Bali, SMA N 1 Singaraja-Bali, S1 Physical Education at Ganesha University of Education, Science Education S2 at Ganesha University of Education. Employment history: laboratory in SMP N 1 Singaraja-Bali (2010-2011) , lecturer in SMP N 1 Singaraja-Bali (2011-2013 ), lecturer of physical education courses at the University of Nusa Cendana (2014-present)