Abstract: This lesson focuses on the development of evaluation tools - based on an example taken from an evaluation of NITOL courses
Authors: Bodil and Fin Ask
Date:
The course lessons remain the property of their authors. The
course participants may freely utilise the lessons for their personal
use. However, if they want to use the lessons for teaching or
other courses they must contact the author directly for a more
precise agreement.
Copyright: Bodil and Fin Ask / HIA
Collaboration in open learning does not stop at the door leading
to the topic of evaluation, so in this lesson we want to guide
you through this door. We will examine evaluation as a professional
subject and clarify it somewhat, but we will finish by considering
evaluation as an aspect of collaboration.
Evaluation is one of those alien-sounding words which eventually
take on native hues, but there is no well-established tradition
for it in this country. In the context of pedagogy it was initially
connected to the phenomenon of assigning grades in school. A teacher
awards grades after evaluating an essay or a mathematical assignment,
but grades are also awarded for more intangible things, for example
grades given for behaviour, diligence and orderliness. Thus we
can infer that there is a process of evaluation prior to awarding
a grade. It is feasible to evaluate a piece of work without giving
a grade for it, but a piece of work ought to have been evaluated
before a grade is awarded for it. A number of people in Norway
have expressed the wish to use the term assessment (vurdering
- which is more Norwegian) rather than the foreign-sounding
evaluation (evaluering), but this has not gained wide recognition.
There is no doubt that the former term precisely defines the core
of the term evaluation, but many people feel that it is not equally
as broad. Evaluation comprises both the assessment itself as well
as the conveying of the outcome. When we use the term evaluation,
we are making it clear that our line of thought agrees with today's
prevalent attitudes on assessment in the context of education.
Evaluation in contemporary education institutions covers far wider
an area than just the assessment of the students and their performance
in various subjects. It may equally well apply to the assessment
of teacher performance and attitudes and significance, or it could
even apply to the entire school environment for that matter. Evaluation
could just as well be applied to the objectives for the institution's
activities as to the processes taking place inside it, obviously
also comprising the results of everything. Hence, it is no longer
only teachers who evaluate. No matter what we evaluate, however,
and no matter who is doing the evaluation, an inherent part of
the process is to convey the outcome.
Evaluation is always a question of assessing something - within
a loose or a more rigid framework. Admittedly, it is not a process
restricted to institutions of learning. We are all concerned with
evaluation every day, wherever we go. It would have been a constructive
step if we could remove the halo surrounding the positive connotations
of the term evaluation, and bring it down to a more mundane level
so that we could fully apply ourselves and our own thinking and
experience. Even if many people have doubts when they are asked
to make an evaluation, all of us are thoroughly familiar with
the matter at hand. We have been evaluating the world since we
were looking out at it from the cradle, conveying our assessment
by means of tears and smiles.
From a linguistic point of view then, evaluating means putting
a value on something. Initially this is a question of whether
you like or do not like something, what you yourself think. You
will then convey your view based on what you think. That should
usually not be so hard - if we feel we can trust the person we
are talking to because then we dare speak our mind. And of course,
we do have opinions - usually - about those topics that occupy
our attention.
In many situations we know clearly whether we like something or
not, and we know equally well why. However, this is not always
the case, and it is mildly interesting that it is not always easy
to state exactly why we 'feel' the way we do about the case in
question. We may be crystal clear on the conclusion, but the reasons
why may be diffuse. A psychological 'game' of thoughts and emotions
takes place in our conscious and unconscious mind, which in relaxed
situations may yield genuine causes and in stressed situations
rationalised causes, while we nevertheless always believe that
we are speaking the truth when we state that our reason is ...
In professional contexts where we proceed more carefully, the
psychological game may be simpler and the causes more real. One
part of maturing in a profession is training the ability to employ
professional, objective reasoning for the standpoint one adopts,
as well as the ability to comprehend and assess the reasons of
others. However, many professional issues may be so complex that
both time and effort are required in order to arrive at a reasonable
point of view.
Thus, generally speaking, it is perhaps sufficient to point out that assessment and the reasons for such an assessment are two separate matters, and that both of them are interesting in the context of evaluation.
In many contexts we use various tools for the evaluations we make.
If we want to judge such units as length, weight, volume or time,
we have developed measuring instruments to aid us. This probably
started when we measured distance in paces or feet. We may scoff
at this today, but it is still a better justification for stating
something about distance when we can say " I have paced it
out" instead of "I can tell by looking at it".
What we are striving to do here is to illuminate an essential
aspect of the evaluation process: If we are to evaluate something,
we need something to evaluate it against. Evaluation will always
be a comparison. A concrete and direct comparison is when we use
a balance scale to weigh something. At one end of the balance
we have the object with the unknown weight, call it A. At the
other end of the balance we have the object with a known weight,
call it B. The scales tell whether A weighs equally as much as
B, more than B, or less than B.
Measuring a length A is to compare with a known length B, which
may be a yardstick. When we have a suitable measure for what we
want to measure, we say that the measurement is valid.
Using the yardstick is a valid means of measuring length, but
not weight.
When a measurement can be performed accurately and reliably, we
say the measurement is reliable. The degree of reliability
is revealed when you measure the same object a number of times,
or when a number of people measure the same object. Pacing to
measure length is a valid way of measuring, but it is not sufficiently
reliable for cutting building materials.
Everything is perfectly easy when we are measuring objects that
can be measured with real measuring instruments, as validity and
reliability may be documented directly. However, this is rarely
the case when we want to evaluate something. For such purposes
we must often create our own measuring tools which will not always
be adequate in every way, but may nevertheless be better than
no measuring tools at all. Consider our calendar. We need a measurement
for Time, capital T. We have made a calendar that tells us we
live in 1997. But everybody knows that time did not commence 1997
years ago. We have selected a random zero point, and count the
years before and after. This works fine while we remain in our
cultural environment, but raises problems when we encounter the
Jewish calendar for example, which starts 3761 years before ours,
using another artificial point zero. Time commenced neither 1997
nor 5738 years ago. Our calendar is not a measure instrument for
Time, but it will serve to keep track of the passing years.
Thus we may claim that we can make useful scales of measurement
even though they are not perfect. Consider some examples taken
from situations where evaluation is applied more in the sense
we use it.
In ski jumping, the style of the jump is measured on a scale ranging
from 1 to 20. Each jumper is judged against the perfect jump as
it has been described to serve as a measure. There are rules stating
what to deduct for and how much. Thus there are absolute requirements
that must be filled if 20 is to be awarded for style, and these
apply regardless of wind and other weather conditions! Evaluating
against clear-cut norms like this is called absolute evaluation.
When cross-country ski athletes compete in a 30 kilometre race,
the person using the shortest time will win, though his/her time
would not have given more than a fifth place in another 30 km
race the week before! Evaluation of the individual thus occurs
in relation to the performance of others. This is relative
evaluation.
So what if we want to evaluate an essay? Is there a norm dictating
how an essay should be written? Mind you, this must be a norm
any teacher would subscribe to, and which can be followed for
any grade and any subject. Obviously there is no such norm. Evaluation
is left to a teacher's judgement, thus becoming mainly subjective
evaluation.
If, on the other hand, you have a standardised test in a subject
using questions requiring only True-False answers, and you also
have a key for the test stating how many correct answers are required
for one grade or another to be awarded on the appropriate scale
of grades, then you have a system of objective evaluation.
(Anybody grading the answers will arrive at the same results).
Nevertheless, with all this in mind, when evaluating matters that we want to know about in the context of teaching there is every reason to query the validity of the 'measuring tools' we utilise. Do they 'measure' what we think they measure?
The material in this section has been included to provide a broader
understanding of the processes of evaluation. Any evaluation we
perform may be judged and described by means of the concepts outlined
above. Now we will go on to addressing evaluation which is directly
relevant for anybody who wants to assess open-learning course
programmes. We intend to do this using an evaluation questionnaire
which has been utilised for evaluating what students think about
course programmes within NITOL.
At the back of this lesson you will find an evaluation questionnaire
which has been used for NITOL courses. It is quite comprehensive.
Most questions can nevertheless be answered quickly by ticking
off the yes/no box or another alternative. We would nevertheless
offer the following advice to those issuing a questionnaire: Make
it as brief as absolutely possible!
The purpose of the first section of this questionnaire is to establish
factual information about the persons attending the course, their
background and preparation for the course, and the resources available
to them at their place of work/study.
The second section has 15 personal questions to ascertain what
the working conditions have been like for the respondent, how
these have been utilised, and whether the respondent enjoyed working
on the course and whether or not he or she was motivated while
working on the course material. This is followed by eight questions
for course evaluation.
The evaluation questionnaire addresses areas which are typical
for open and flexible learning. The intention is that it should
illuminate how students utilise electronic network services, for
example e-mail and conferences. A further aim is to ascertain
whether students themselves take initiatives to establish networks,
so-called virtual learning environments with other students, or
whether they regard this as a waste of time. Purely technical
aspects are also evaluated considering background skills and use.
The availability of technical resources, which in this form of
open and flexible learning programmes is a necessity, is also
assessed.
A teaching programme so heavily based on technology as these NITOL
courses might quickly become difficult for some people. It is
of course also important to obtain information about this.
The overall intention has been to determine how end users/ students
think such open learning programmes work. Many of the questions,
however, could equally well have been given to supervisors/student
assistants and to course developers/instructors, and this has
also been the case.
If we consider the types of expected answers in the evaluation
questionnaire, they are widely different. The purpose of the first
section is to collect facts and descriptive information about
and concerning the students. This is therefore not evaluative.
By ticking off alternatives for yes, no or others, students may
quickly cover a lot of ground. However, space has also been set
aside for students to explain why they tick off the selected alternatives.
The second section of the questionnaire requests evaluation of
various matters, and the response alternatives reflect this. We
encounter a series of different measurement scales, thus giving
rise to the question: Which are the requirements we should have
for a qualitative scale of measurement?
We stated above that measurements of various kinds might serve
as evaluation tools. The measuring of weight, length or volumes
utilises quantitative measuring scales, meaning that the result
is expressed as an amount. If the measurement we want concerns
the comparison of amounts, quantitative measurements may be good
tools.
However, if someone asks us how we liked a particular film we just saw we have a totally different situation. We would then put our thoughts into words, for example offering one of the following answers:
I did not like it - I did not like it much - I liked it so-so,
I liked it well - I liked it a lot.
We would then have evaluated the film in relation to what we want
from a film. If we want to be generous, we might say that we have
evaluated the quality of the film. (We have certainly not judged
it on the basis of its length!) A scale of words such as used
here, is called a qualitative scale of measurement. When we speak
about qualitative scales of measurement we must give the concept
of quality a very wide interpretation, so wide that almost everything
which is not an amount comes under quality. Thus comparative adjectives
offer numerous examples of qualitative scales: yellow - more yellow
- most yellow, nice - nicer - nicest., etc.
It is especially when we want to judge aspects of quality that
evaluation and subjective judgements become important. It may
then be useful to have a scale of measurements in order to express
an opinion. This facilitates the work for the evaluator, it guides
thinking, and it may improve communications when conveying the
results. The requirement is that the scale is a good scale for
what is to be measured.
For a scale to be good, it must
1. be linguistically appropriate for the matter at hand,
2. have a reasonable number of gradations or steps in relation to the matter at hand, and
3. the steps must constitute a comprehensible system.
If we test this on the qualitative scale formulated for the question
on how we enjoyed the film, we would offer the following comment:
1. The linguistic requirement: The response words used in the scale correspond to spoken language and are thus appropriate for the question. Therefore the response alternatives are eminently serviceable. However, complete sentences are too cumbersome to use in questionnaires. We have to accept modifications, such as: 'I liked the film
not at all - poorly - so-so - well - very well'.
2. The number of steps A scale of five steps is manageable
for the interviewee and offers sufficient gradation for most purposes.
Hence, this is generally an adequate number. You should have very
good reasons for preferring seven steps, as this would likely
yield more trouble than gain.
If the matter does not require a fine gradation of the responses,
offering only three alternatives may be the most reasonable choice,
for example: Poor - Fair - Good.
The scale should normally have a middle step offering an average
or neutral degree, and then offer one or two alternatives above
and below this alternative. If a neutral medium alternative is
lacking, the respondent may be forced into expressing a feeling
or meaning s/he perhaps does not hold.
3. The step system A measuring scale may be compared to
a stairway leading from a lower level to a higher level. One essential
aspect of stairs is that each step is of equal height. If they
are not, you will be left with a feeling that the stairs are flawed
when you walk up or down them. We may also get this feeling when
we are given a qualitative measuring scale. We may 'feel' that
the steps of the scale do not create an equally regulated system.
When we make steps with words, it may be hard to place the steps
equally far apart. The reason might be that the language simply
does not offer an adequate selection of words as there are not
enough words with clear, different values. It may, however, also
be that we do not strive hard enough to find the proper word.
Using digits instead of words offers the advantage that per definition
there is a step of one unit between each when they are printed
in order. Some people are tempted by this, requesting the respondent
to use a scale from one to five. This may, however, counteract
requirement no. 1, that the response should be appropriate for
natural speech. The response should also be comprehensible for
you as well. It should agree with the intentions and ideas you
have.
Whether using digits will raise problems depends on the question
asked. If the question is 'How many points would you award melody
A?', there is no problem. If the question is 'How did you like
the film?', you must first give words to your thoughts then you
must decide which number corresponds to your opinion, and this
might not be so obvious. It is even less obvious whether everybody
will apply the digital scale equally. However, a combination of
digits and words might offer advantages worth considering. We
shall offer two examples of this.
A. If the questions all concern how well the respondent liked
some films that were shown, and there is a wish that a particular
qualitative scale should be used, this wish could be presented
at the very start of the questionnaire, for example this way:
When you state how well you liked the films, we would like you to use this scale:
Poor (1) - Not to well (2) - Well (3) - Very well (4) - Extremely well (5)
After each film these digits are printed in order, thus:
How well did you like 'The Good Earth'? 1 - 2 - 3 - 4 - 5
If you liked the film extremely well, you underline the digit
5, if you liked it very well, you underline the digit 4 and so
on.
The purpose is to have the respondent think in terms of a scale
of words, while using digits to facilitate matters.
At a glance, we understand that this will save extensive writing,
and it might probably strengthen the idea that this was a scale
with understandable steps and where 'Well' is a neutral mean value.
B. If the questions are more varied than in example A, both hard work and much space may be required to tailor a word scale for each question asked. Therefore you might apply response phrases which are easier to use generally, and which do not appear out of joint with the answer. We thus are willing to ease up on the language requirement to obtain
another benefit. (Doing something like this should always be for some benefit or other!) A possible example of such accommodation could be:
far below average - below average - average - above average -far above average
(fba - ba - a - ab - fab)
We will offer examples of varied questions which lend themselves
to using the same response alternatives taken from the enclosed
questionnaire. We have modified some of the questions a little
(without changing their meaning) to adapt them to the answers.
1. How easy was it for you to use the course material?
fba - ba -a - ab - fab
7. How interesting were the conference comments to
you? fba - ba - a - ab - fab
11. How was your motivation during the course? fba
- ba - a - ab - fab
14. Was it difficult for you to plan the course work?
fba - ba - a - ab - fab
15. How much support were you given by course assistants/
.... teachers? 1.- 2 - 3 - 4 - 5
16 How did you like to take this type of course?
1 - 2 - 3 - 4 - 5
17. To what extent did you feel that the course corresponded
to your expectations? 1 - 2 - 3 - 4 - 5
18. How effective did you feel that it was as a learning
experiment? 1 - 2 - 3 - 4 - 5
To add extra impulse to how you think about this we have used
both digits and abbreviated words in the examples.
Clearly there might be an element of language distortion when
we want a more generalized scale. However, the questions might
be rephrased somewhat to get language relevance in the answers.
The above example demonstrates that this system might be space-saving,
and by reducing the number of qualitative measuring scales in
one and the same questionnaire, time is also saved for the designer,
the respondent and the person processing it afterwards. In other
words, again one must evaluate also when creating evaluation questionnaires.
Among the many considerations:
The appearance of a questionnaire should invite answers! (Compare the example above with the relevant pages of the questionnaire below. Which of them invites your answer? There is something called lay-out ...)
Questionnaires should be made bearing in mind how the answers will be registered and later presented.
The questions should concern matters the respondent must be able to perceive as reasonable.
The questions should be simple and clear (which everybody agrees
with)
Obviously, this is the most important question to answer when
making a questionnaire. If we limit the discussion to NITOL courses
taught in an open and distance-based context and questionnaires
for these, the questions should address the issues we hold as
essential to just this type of course, and what we have focused
our efforts on. We shall here comment on what this may be, without
exhausting the subject, and without worrying about putting the
issues in the proper order. Let us therefore place some main elements
in frames to be regarded as pawns that can be moved. These pieces
can be enlarged or reduced according to what we want to emphasise
in an evaluation phase.
Work assignment:
The above frames offer some ideas! Examine the questionnaire
to ascertain whether it includes anything not featured here. Add
everything you come upon in new frames you make. You may also
supplement the frames with additional text. When reading the questionnaire,
place a question mark by those questions you consider to be not
clear enough, or which you consider unnecessary.
Other ways of evaluating
In collaborative programmes such as represented by NITOL, not
all evaluation can occur through printed forms and according to
given criteria. There will be a large amount of informal evaluation
during discussions which will follow naturally and of themselves.
If a joint programme for several institutions is to work, it requires
mutual discussions in the shape of board meetings, decisions,
formal and informal correspondence etc. The experience gained
concerning the use of IT as the means of conveying teaching programmes,
the collaboration on development of subject components, their
maintenance and many other aspects, are part of what will be included
in a total evaluation report on open and distance-based learning.
Student reactions and opinions will be part of this larger whole.
1. BACKGROUND
1. You are taking course number .......................................
Are you taking part as a
Registered student?
Regular, but not registered student?
Random observer/participant?
What are your main reasons for participating?
2. PERSONAL INFORMATION
2.
Gender Male | Female |
3.
Age
4.
Do you have care responsibilities for others ? (children, disabled persons, an elderly person etc.) Y/N
5.
Yes No Are you a full-time student at a university or college? |
Part-time student? |
Full-time employee? |
Part-time employee? |
Unemployed? |
3. PREPARATION
6.
Did anybody encourage you Yes No
to take this NITOL course?
7. Why are you attending this course?
a) Personal interest | |
b) Job preparation | |
c) Part of my academic education | |
d) Other reasons |
8. Why are you taking this course as flexible/distance
learning?
a) It was the only way to take this course | |
b) It suits me better than attending a regular course | |
c) I cannot follow regular lectures, as these do not fit my schedule | |
d) The course is not available where I live | |
e) Other reasons |
9.
Are you collaborating with others via the net during this course? | Yes No |
If yes, in which way? | |
a) I discuss subject or syllabus problems | |
b) I discuss problems concerning course organisation (does the e-mail system work, the conferences etc.) | |
c) I collaborate on the net to compensate for being alone (isolated distance learning student) | |
d) I collaborate to get information and advice from other students | |
e) Other reasons |
10.
Has anybody encouraged you to collaborate with others via the net during this course? | Yes No |
11.
Are you collaborating with other students to complete the course? | Yes No |
If yes, are you sharing the responsibility for study organisation (picking up all the information, handing in exercises in time etc.) ? | Yes No |
12.
How are you connected to the net? | |
a) Via the university, college or other school | |
b) Remote connection via modem | |
c) Other ways |
13.
Have you received any technical instruction for the equipment used during this course? | Yes No |
If yes, what kind of training?
14.
Was the training appropriate/good? | Yes No |
15. If no, how could the training have been improved?
16.
Can you get technical assistance if you need it? | Yes No |
17. Can the technical training be improved? If yes, how?
18. Are you familiar with the technical equipment used?
1 | 2 | 3 | 4 | 5 |
(1. Familiar ..... 5. Not familiar)
DISTRIBUTION AND RESOURCE AVAILABILITY
19.
Do you have access to the required equipment when you need it? | Yes No |
If no, describe why you do not have access.
20. Which learning resources are available to you? (Computers, manuals, people etc.)
21. Do you have any problems getting hold of the resources?
Yes No |
22. Do you think you are exploiting the resources optimally?
Yes No |
If no, explain
24. When in the day/week do you usually follow the courses?
25. How do you see the ideal open learning environment?
26. What demands would you set for the technology used to make an open learning system function well?
27.
Is it your impression that our present technology satisfies these demands? | Yes No |
COMPLETING THE COURSE
1. How easy or hard was it for you to use the course material?
Very easy | |||
Relatively easy | |||
Relatively hard | |||
Very hard |
2. Did you feel that you received adequate help when using the
teaching material and assistance in solving any problems?
Yes | |||
No |
3. If you needed assistance with course material, where did you
most often get it?
Colleagues, co-students etc. | |||
Documentation | |||
Through discussion conferences for the subject | |||
Other conferences | |||
Other electronic sources, e.g. on-line help or e-mail |
4. How often did you read messages/comments on conferences or
e-mail?
Several times daily | |||
Daily | |||
Weekly | |||
Monthly |
5. How often did you contribute to the conferences?
Often | |||
Occasionally | |||
Rarely | |||
Never |
6. If you did NOT use the conferences or RARELY used them, what
was the reason?
Not enough time | |||
They were too difficult to use | |||
I felt it was a waste of time |
7. How interesting were the comments on the conference for you?
Very interesting | |||
Somewhat interesting | |||
Not interesting |
8. How often were conference comments useful for understanding
the course or solving course problems?
Often | |||
Occasionally | |||
Rarely | |||
Never |
9. How many conferences are you reading?
None | |||
1-5 | |||
6-10 | |||
More than 10 |
10. How often did you have technical problems with the teaching
material or the conferences?
Often | |||
Occasionally | |||
Rarely | |||
Never |
11. How would you describe your motivation during the course?
Very high | |||
High | |||
Moderate | |||
Low | |||
Very low |
12. When did you normally work on the course?
Before work | |||
In the morning | |||
During lunch | |||
After lunch | |||
After work | |||
During weekends |
12. Where did you work on the course?
At home | |||
At work/the university/the college | |||
Both places |
14. How hard was it for you to plan course work?
Very easy | |||
Relatively easy | |||
Relatively hard | |||
Very hard |
15. How much support did you feel you received from course assistants/teachers?
Much | |||
A good deal | |||
Moderate | |||
Little | |||
None |
AT THE END OF THE COURSE
16. How did you enjoy participating in this type of course?
Very much | |||
OK | |||
Not much |
17. To which extent did you feel that the course corresponded
to your expectations?
Better than expected | |||
About as I expected | |||
Less than expected |
18. How effective did you feel such a self-study course was, as
a learning experiment?
Very effective | |||
Somewhat effective | |||
Not effective |
19. How would you rank this self-study course compared to other
courses you have taken?
Much more effective | |||
More effective | |||
As effective | |||
Less effective | |||
Much less effective |
20. Compared to other methods to obtain information THERE and
THEN (as for example colleagues, co-students, documentation etc.)
how would you rank the discussion conferences?
Much more effective | |||
More effective | |||
As effective | |||
Less effective | |||
Much less effective |
21. Which of the following aspects of the NITOL project made this
an effective learning experience for you? Indicate all aspects
you felt had an influence.
I was able to study at my own pace | |||
I was able to study when it suited me | |||
The course contained all the information I needed | |||
I liked sharing information on the discussion conferences | |||
The discussion conferences gave me the information I needed |
22. Which of the following aspects of the NITOL project made this
a LESS effective learning experience for you? Indicate all aspects
you felt had an influence.
It was hard to find enough time to participate adequately in the discussion conferences | |||
I felt isolated | |||
I needed more support and encouragement in order to be able to learn efficiently | |||
The course did not include the information I needed | |||
The discussion conferences did not give me the information I needed |
24. Would you consider taking more courses of this type in the
future?
Yes | |||
Maybe | |||
No |