Distance Teaching from NITOL

Evaluation in IT-based ODL courses

Abstract: This lesson focuses on the development of evaluation tools - based on an example taken from an evaluation of NITOL courses

Authors: Bodil and Fin Ask

Date:

The course lessons remain the property of their authors. The course participants may freely utilise the lessons for their personal use. However, if they want to use the lessons for teaching or other courses they must contact the author directly for a more precise agreement.

Copyright: Bodil and Fin Ask / HIA

What evaluation is

Collaboration in open learning does not stop at the door leading to the topic of evaluation, so in this lesson we want to guide you through this door. We will examine evaluation as a professional subject and clarify it somewhat, but we will finish by considering evaluation as an aspect of collaboration.

Evaluation is one of those alien-sounding words which eventually take on native hues, but there is no well-established tradition for it in this country. In the context of pedagogy it was initially connected to the phenomenon of assigning grades in school. A teacher awards grades after evaluating an essay or a mathematical assignment, but grades are also awarded for more intangible things, for example grades given for behaviour, diligence and orderliness. Thus we can infer that there is a process of evaluation prior to awarding a grade. It is feasible to evaluate a piece of work without giving a grade for it, but a piece of work ought to have been evaluated before a grade is awarded for it. A number of people in Norway have expressed the wish to use the term assessment (vurdering - which is more Norwegian) rather than the foreign-sounding evaluation (evaluering), but this has not gained wide recognition. There is no doubt that the former term precisely defines the core of the term evaluation, but many people feel that it is not equally as broad. Evaluation comprises both the assessment itself as well as the conveying of the outcome. When we use the term evaluation, we are making it clear that our line of thought agrees with today's prevalent attitudes on assessment in the context of education.

Evaluation in contemporary education institutions covers far wider an area than just the assessment of the students and their performance in various subjects. It may equally well apply to the assessment of teacher performance and attitudes and significance, or it could even apply to the entire school environment for that matter. Evaluation could just as well be applied to the objectives for the institution's activities as to the processes taking place inside it, obviously also comprising the results of everything. Hence, it is no longer only teachers who evaluate. No matter what we evaluate, however, and no matter who is doing the evaluation, an inherent part of the process is to convey the outcome.

Evaluation is always a question of assessing something - within a loose or a more rigid framework. Admittedly, it is not a process restricted to institutions of learning. We are all concerned with evaluation every day, wherever we go. It would have been a constructive step if we could remove the halo surrounding the positive connotations of the term evaluation, and bring it down to a more mundane level so that we could fully apply ourselves and our own thinking and experience. Even if many people have doubts when they are asked to make an evaluation, all of us are thoroughly familiar with the matter at hand. We have been evaluating the world since we were looking out at it from the cradle, conveying our assessment by means of tears and smiles.

From a linguistic point of view then, evaluating means putting a value on something. Initially this is a question of whether you like or do not like something, what you yourself think. You will then convey your view based on what you think. That should usually not be so hard - if we feel we can trust the person we are talking to because then we dare speak our mind. And of course, we do have opinions - usually - about those topics that occupy our attention.

In many situations we know clearly whether we like something or not, and we know equally well why. However, this is not always the case, and it is mildly interesting that it is not always easy to state exactly why we 'feel' the way we do about the case in question. We may be crystal clear on the conclusion, but the reasons why may be diffuse. A psychological 'game' of thoughts and emotions takes place in our conscious and unconscious mind, which in relaxed situations may yield genuine causes and in stressed situations rationalised causes, while we nevertheless always believe that we are speaking the truth when we state that our reason is ...

In professional contexts where we proceed more carefully, the psychological game may be simpler and the causes more real. One part of maturing in a profession is training the ability to employ professional, objective reasoning for the standpoint one adopts, as well as the ability to comprehend and assess the reasons of others. However, many professional issues may be so complex that both time and effort are required in order to arrive at a reasonable point of view.

Thus, generally speaking, it is perhaps sufficient to point out that assessment and the reasons for such an assessment are two separate matters, and that both of them are interesting in the context of evaluation.

What evaluation can also be

In many contexts we use various tools for the evaluations we make. If we want to judge such units as length, weight, volume or time, we have developed measuring instruments to aid us. This probably started when we measured distance in paces or feet. We may scoff at this today, but it is still a better justification for stating something about distance when we can say " I have paced it out" instead of "I can tell by looking at it".

What we are striving to do here is to illuminate an essential aspect of the evaluation process: If we are to evaluate something, we need something to evaluate it against. Evaluation will always be a comparison. A concrete and direct comparison is when we use a balance scale to weigh something. At one end of the balance we have the object with the unknown weight, call it A. At the other end of the balance we have the object with a known weight, call it B. The scales tell whether A weighs equally as much as B, more than B, or less than B.

Measuring a length A is to compare with a known length B, which may be a yardstick. When we have a suitable measure for what we want to measure, we say that the measurement is valid. Using the yardstick is a valid means of measuring length, but not weight.

When a measurement can be performed accurately and reliably, we say the measurement is reliable. The degree of reliability is revealed when you measure the same object a number of times, or when a number of people measure the same object. Pacing to measure length is a valid way of measuring, but it is not sufficiently reliable for cutting building materials.

Everything is perfectly easy when we are measuring objects that can be measured with real measuring instruments, as validity and reliability may be documented directly. However, this is rarely the case when we want to evaluate something. For such purposes we must often create our own measuring tools which will not always be adequate in every way, but may nevertheless be better than no measuring tools at all. Consider our calendar. We need a measurement for Time, capital T. We have made a calendar that tells us we live in 1997. But everybody knows that time did not commence 1997 years ago. We have selected a random zero point, and count the years before and after. This works fine while we remain in our cultural environment, but raises problems when we encounter the Jewish calendar for example, which starts 3761 years before ours, using another artificial point zero. Time commenced neither 1997 nor 5738 years ago. Our calendar is not a measure instrument for Time, but it will serve to keep track of the passing years.

Thus we may claim that we can make useful scales of measurement even though they are not perfect. Consider some examples taken from situations where evaluation is applied more in the sense we use it.

In ski jumping, the style of the jump is measured on a scale ranging from 1 to 20. Each jumper is judged against the perfect jump as it has been described to serve as a measure. There are rules stating what to deduct for and how much. Thus there are absolute requirements that must be filled if 20 is to be awarded for style, and these apply regardless of wind and other weather conditions! Evaluating against clear-cut norms like this is called absolute evaluation.

When cross-country ski athletes compete in a 30 kilometre race, the person using the shortest time will win, though his/her time would not have given more than a fifth place in another 30 km race the week before! Evaluation of the individual thus occurs in relation to the performance of others. This is relative evaluation.

So what if we want to evaluate an essay? Is there a norm dictating how an essay should be written? Mind you, this must be a norm any teacher would subscribe to, and which can be followed for any grade and any subject. Obviously there is no such norm. Evaluation is left to a teacher's judgement, thus becoming mainly subjective evaluation.

If, on the other hand, you have a standardised test in a subject using questions requiring only True-False answers, and you also have a key for the test stating how many correct answers are required for one grade or another to be awarded on the appropriate scale of grades, then you have a system of objective evaluation. (Anybody grading the answers will arrive at the same results).

Nevertheless, with all this in mind, when evaluating matters that we want to know about in the context of teaching there is every reason to query the validity of the 'measuring tools' we utilise. Do they 'measure' what we think they measure?

The material in this section has been included to provide a broader understanding of the processes of evaluation. Any evaluation we perform may be judged and described by means of the concepts outlined above. Now we will go on to addressing evaluation which is directly relevant for anybody who wants to assess open-learning course programmes. We intend to do this using an evaluation questionnaire which has been utilised for evaluating what students think about course programmes within NITOL.

Evaluation within NITOL

At the back of this lesson you will find an evaluation questionnaire which has been used for NITOL courses. It is quite comprehensive. Most questions can nevertheless be answered quickly by ticking off the yes/no box or another alternative. We would nevertheless offer the following advice to those issuing a questionnaire: Make it as brief as absolutely possible!

The purpose of the first section of this questionnaire is to establish factual information about the persons attending the course, their background and preparation for the course, and the resources available to them at their place of work/study.

The second section has 15 personal questions to ascertain what the working conditions have been like for the respondent, how these have been utilised, and whether the respondent enjoyed working on the course and whether or not he or she was motivated while working on the course material. This is followed by eight questions for course evaluation.

The evaluation questionnaire addresses areas which are typical for open and flexible learning. The intention is that it should illuminate how students utilise electronic network services, for example e-mail and conferences. A further aim is to ascertain whether students themselves take initiatives to establish networks, so-called virtual learning environments with other students, or whether they regard this as a waste of time. Purely technical aspects are also evaluated considering background skills and use. The availability of technical resources, which in this form of open and flexible learning programmes is a necessity, is also assessed.

A teaching programme so heavily based on technology as these NITOL courses might quickly become difficult for some people. It is of course also important to obtain information about this.

The overall intention has been to determine how end users/ students think such open learning programmes work. Many of the questions, however, could equally well have been given to supervisors/student assistants and to course developers/instructors, and this has also been the case.

If we consider the types of expected answers in the evaluation questionnaire, they are widely different. The purpose of the first section is to collect facts and descriptive information about and concerning the students. This is therefore not evaluative. By ticking off alternatives for yes, no or others, students may quickly cover a lot of ground. However, space has also been set aside for students to explain why they tick off the selected alternatives.

The second section of the questionnaire requests evaluation of various matters, and the response alternatives reflect this. We encounter a series of different measurement scales, thus giving rise to the question: Which are the requirements we should have for a qualitative scale of measurement?

Scales of measurement and evaluation

We stated above that measurements of various kinds might serve as evaluation tools. The measuring of weight, length or volumes utilises quantitative measuring scales, meaning that the result is expressed as an amount. If the measurement we want concerns the comparison of amounts, quantitative measurements may be good tools.

However, if someone asks us how we liked a particular film we just saw we have a totally different situation. We would then put our thoughts into words, for example offering one of the following answers:

I did not like it - I did not like it much - I liked it so-so, I liked it well - I liked it a lot.

We would then have evaluated the film in relation to what we want from a film. If we want to be generous, we might say that we have evaluated the quality of the film. (We have certainly not judged it on the basis of its length!) A scale of words such as used here, is called a qualitative scale of measurement. When we speak about qualitative scales of measurement we must give the concept of quality a very wide interpretation, so wide that almost everything which is not an amount comes under quality. Thus comparative adjectives offer numerous examples of qualitative scales: yellow - more yellow - most yellow, nice - nicer - nicest., etc.

It is especially when we want to judge aspects of quality that evaluation and subjective judgements become important. It may then be useful to have a scale of measurements in order to express an opinion. This facilitates the work for the evaluator, it guides thinking, and it may improve communications when conveying the results. The requirement is that the scale is a good scale for what is to be measured.

For a scale to be good, it must

1. be linguistically appropriate for the matter at hand,

2. have a reasonable number of gradations or steps in relation to the matter at hand, and

3. the steps must constitute a comprehensible system.

If we test this on the qualitative scale formulated for the question on how we enjoyed the film, we would offer the following comment:

1. The linguistic requirement: The response words used in the scale correspond to spoken language and are thus appropriate for the question. Therefore the response alternatives are eminently serviceable. However, complete sentences are too cumbersome to use in questionnaires. We have to accept modifications, such as: 'I liked the film

not at all - poorly - so-so - well - very well'.

2. The number of steps A scale of five steps is manageable for the interviewee and offers sufficient gradation for most purposes. Hence, this is generally an adequate number. You should have very good reasons for preferring seven steps, as this would likely yield more trouble than gain.

If the matter does not require a fine gradation of the responses, offering only three alternatives may be the most reasonable choice, for example: Poor - Fair - Good.

The scale should normally have a middle step offering an average or neutral degree, and then offer one or two alternatives above and below this alternative. If a neutral medium alternative is lacking, the respondent may be forced into expressing a feeling or meaning s/he perhaps does not hold.

3. The step system A measuring scale may be compared to a stairway leading from a lower level to a higher level. One essential aspect of stairs is that each step is of equal height. If they are not, you will be left with a feeling that the stairs are flawed when you walk up or down them. We may also get this feeling when we are given a qualitative measuring scale. We may 'feel' that the steps of the scale do not create an equally regulated system.

When we make steps with words, it may be hard to place the steps equally far apart. The reason might be that the language simply does not offer an adequate selection of words as there are not enough words with clear, different values. It may, however, also be that we do not strive hard enough to find the proper word.

Using digits instead of words offers the advantage that per definition there is a step of one unit between each when they are printed in order. Some people are tempted by this, requesting the respondent to use a scale from one to five. This may, however, counteract requirement no. 1, that the response should be appropriate for natural speech. The response should also be comprehensible for you as well. It should agree with the intentions and ideas you have.

Whether using digits will raise problems depends on the question asked. If the question is 'How many points would you award melody A?', there is no problem. If the question is 'How did you like the film?', you must first give words to your thoughts then you must decide which number corresponds to your opinion, and this might not be so obvious. It is even less obvious whether everybody will apply the digital scale equally. However, a combination of digits and words might offer advantages worth considering. We shall offer two examples of this.

A. If the questions all concern how well the respondent liked some films that were shown, and there is a wish that a particular qualitative scale should be used, this wish could be presented at the very start of the questionnaire, for example this way:

When you state how well you liked the films, we would like you to use this scale:

Poor (1) - Not to well (2) - Well (3) - Very well (4) - Extremely well (5)

After each film these digits are printed in order, thus:

How well did you like 'The Good Earth'? 1 - 2 - 3 - 4 - 5

If you liked the film extremely well, you underline the digit 5, if you liked it very well, you underline the digit 4 and so on.

The purpose is to have the respondent think in terms of a scale of words, while using digits to facilitate matters.

At a glance, we understand that this will save extensive writing, and it might probably strengthen the idea that this was a scale with understandable steps and where 'Well' is a neutral mean value.

B. If the questions are more varied than in example A, both hard work and much space may be required to tailor a word scale for each question asked. Therefore you might apply response phrases which are easier to use generally, and which do not appear out of joint with the answer. We thus are willing to ease up on the language requirement to obtain

another benefit. (Doing something like this should always be for some benefit or other!) A possible example of such accommodation could be:

far below average - below average - average - above average -far above average

(fba - ba - a - ab - fab)

We will offer examples of varied questions which lend themselves to using the same response alternatives taken from the enclosed questionnaire. We have modified some of the questions a little (without changing their meaning) to adapt them to the answers.

1. How easy was it for you to use the course material? fba - ba -a - ab - fab

7. How interesting were the conference comments to you? fba - ba - a - ab - fab

11. How was your motivation during the course? fba - ba - a - ab - fab

14. Was it difficult for you to plan the course work? fba - ba - a - ab - fab

15. How much support were you given by course assistants/

.... teachers? 1.- 2 - 3 - 4 - 5

16 How did you like to take this type of course? 1 - 2 - 3 - 4 - 5

17. To what extent did you feel that the course corresponded

to your expectations? 1 - 2 - 3 - 4 - 5

18. How effective did you feel that it was as a learning

experiment? 1 - 2 - 3 - 4 - 5

To add extra impulse to how you think about this we have used both digits and abbreviated words in the examples.

Clearly there might be an element of language distortion when we want a more generalized scale. However, the questions might be rephrased somewhat to get language relevance in the answers. The above example demonstrates that this system might be space-saving, and by reducing the number of qualitative measuring scales in one and the same questionnaire, time is also saved for the designer, the respondent and the person processing it afterwards. In other words, again one must evaluate also when creating evaluation questionnaires. Among the many considerations:

The appearance of a questionnaire should invite answers! (Compare the example above with the relevant pages of the questionnaire below. Which of them invites your answer? There is something called lay-out ...)

Questionnaires should be made bearing in mind how the answers will be registered and later presented.

The questions should concern matters the respondent must be able to perceive as reasonable.

The questions should be simple and clear (which everybody agrees with)

What should we ask about?

Obviously, this is the most important question to answer when making a questionnaire. If we limit the discussion to NITOL courses taught in an open and distance-based context and questionnaires for these, the questions should address the issues we hold as essential to just this type of course, and what we have focused our efforts on. We shall here comment on what this may be, without exhausting the subject, and without worrying about putting the issues in the proper order. Let us therefore place some main elements in frames to be regarded as pawns that can be moved. These pieces can be enlarged or reduced according to what we want to emphasise in an evaluation phase.

Work assignment:

The above frames offer some ideas! Examine the questionnaire to ascertain whether it includes anything not featured here. Add everything you come upon in new frames you make. You may also supplement the frames with additional text. When reading the questionnaire, place a question mark by those questions you consider to be not clear enough, or which you consider unnecessary.

Other ways of evaluating

In collaborative programmes such as represented by NITOL, not all evaluation can occur through printed forms and according to given criteria. There will be a large amount of informal evaluation during discussions which will follow naturally and of themselves. If a joint programme for several institutions is to work, it requires mutual discussions in the shape of board meetings, decisions, formal and informal correspondence etc. The experience gained concerning the use of IT as the means of conveying teaching programmes, the collaboration on development of subject components, their maintenance and many other aspects, are part of what will be included in a total evaluation report on open and distance-based learning. Student reactions and opinions will be part of this larger whole. 1. BACKGROUND

1. You are taking course number .......................................

Are you taking part as a

Registered student?

Regular, but not registered student?

Random observer/participant?

What are your main reasons for participating?

2. PERSONAL INFORMATION

Gender Male Female

Age

Do you have care responsibilities for others ? (children, disabled persons, an elderly person etc.) Y/N

Yes No
Are you a full-time student at a
university or college?

Part-time student?

Full-time employee?

Part-time employee?

Unemployed?

3. PREPARATION

Did anybody encourage you Yes No

to take this NITOL course?

7. Why are you attending this course?

a) Personal interest

b) Job preparation

c) Part of my academic education

d) Other reasons

8. Why are you taking this course as flexible/distance learning?

a) It was the only way to take this course

b) It suits me better than attending a regular course

c) I cannot follow regular lectures, as these do not fit my schedule

d) The course is not available where I live

e) Other reasons

Are you collaborating with others via the net during this course? Yes No

If yes, in which way?

a) I discuss subject or syllabus problems

b) I discuss problems concerning course organisation (does the e-mail system work, the conferences etc.)

c) I collaborate on the net to compensate for being alone (isolated distance learning student)

d) I collaborate to get information and advice from other students

e) Other reasons

10.

Has anybody encouraged you to collaborate with others via the net during this course? Yes No

11.

Are you collaborating with other students to complete the course? Yes No

If yes, are you sharing the responsibility for study organisation (picking up all the information, handing in exercises in time etc.) ? Yes No

12.

How are you connected to the net?

a) Via the university, college or other school

b) Remote connection via modem

c) Other ways

13.

Have you received any technical instruction for the equipment used during this course? Yes No

If yes, what kind of training?

14.

Was the training appropriate/good? Yes No

15. If no, how could the training have been improved?

16.

Can you get technical assistance if you need it?
Yes No

17. Can the technical training be improved? If yes, how?

18. Are you familiar with the technical equipment used?

1 2 3 4 5

(1. Familiar ..... 5. Not familiar)

DISTRIBUTION AND RESOURCE AVAILABILITY

19.

Do you have access to the required equipment when you need it? Yes No

If no, describe why you do not have access.

20. Which learning resources are available to you? (Computers, manuals, people etc.)

21. Do you have any problems getting hold of the resources?

Yes No

22. Do you think you are exploiting the resources optimally?

Yes No

If no, explain

24. When in the day/week do you usually follow the courses?

25. How do you see the ideal open learning environment?

26. What demands would you set for the technology used to make an open learning system function well?

27.

Is it your impression that our present technology satisfies these demands? Yes No

COMPLETING THE COURSE

1. How easy or hard was it for you to use the course material?

Very easy

Relatively easy

Relatively hard

Very hard

2. Did you feel that you received adequate help when using the teaching material and assistance in solving any problems?

Yes

No

3. If you needed assistance with course material, where did you most often get it?

Colleagues, co-students etc.

Documentation

Through discussion conferences for the subject

Other conferences

Other electronic sources, e.g. on-line help or e-mail

4. How often did you read messages/comments on conferences or e-mail?

Several times daily

Daily

Weekly

Monthly

5. How often did you contribute to the conferences?

Often

Occasionally

Rarely

Never

6. If you did NOT use the conferences or RARELY used them, what was the reason?

Not enough time

They were too difficult to use

I felt it was a waste of time

7. How interesting were the comments on the conference for you?

Very interesting

Somewhat interesting

Not interesting

8. How often were conference comments useful for understanding the course or solving course problems?

Often

Occasionally

Rarely

Never

9. How many conferences are you reading?

None

1-5

6-10

More than 10

10. How often did you have technical problems with the teaching material or the conferences?

Often

Occasionally

Rarely

Never

11. How would you describe your motivation during the course?

Very high

High

Moderate

Low

Very low

12. When did you normally work on the course?

Before work

In the morning

During lunch

After lunch

After work

During weekends

12. Where did you work on the course?

At home

At work/the university/the college

Both places

14. How hard was it for you to plan course work?

Very easy

Relatively easy

Relatively hard

Very hard

15. How much support did you feel you received from course assistants/teachers?

Much

A good deal

Moderate

Little

None

AT THE END OF THE COURSE

16. How did you enjoy participating in this type of course?

Very much

OK

Not much

17. To which extent did you feel that the course corresponded to your expectations?

Better than expected

About as I expected

Less than expected

18. How effective did you feel such a self-study course was, as a learning experiment?

Very effective

Somewhat effective

Not effective

19. How would you rank this self-study course compared to other courses you have taken?

Much more effective

More effective

As effective

Less effective

Much less effective

20. Compared to other methods to obtain information THERE and THEN (as for example colleagues, co-students, documentation etc.) how would you rank the discussion conferences?

Much more effective

More effective

As effective

Less effective

Much less effective

21. Which of the following aspects of the NITOL project made this an effective learning experience for you? Indicate all aspects you felt had an influence.

I was able to study at my own pace

I was able to study when it suited me

The course contained all the information I needed

I liked sharing information on the discussion conferences

The discussion conferences gave me the information I needed

22. Which of the following aspects of the NITOL project made this a LESS effective learning experience for you? Indicate all aspects you felt had an influence.

It was hard to find enough time to participate adequately in the discussion conferences

I felt isolated

I needed more support and encouragement in order to be able to learn efficiently

The course did not include the information I needed

The discussion conferences did not give me the information I needed

24. Would you consider taking more courses of this type in the future?

Yes

Maybe

No

a) Personal interest
b) Job preparation
c) Part of my academic education
d) Other reasons

a) It was the only way to take this course
b) It suits me better than attending a regular course
c) I cannot follow regular lectures, as these do not fit my schedule
d) The course is not available where I live
e) Other reasons

If yes, in which way?
a) I discuss subject or syllabus problems
b) I discuss problems concerning course organisation (does the e-mail system work, the conferences etc.)
c) I collaborate on the net to compensate for being alone (isolated distance learning student)
d) I collaborate to get information and advice from other students
e) Other reasons

Are you collaborating with other students to complete the course?	Yes No
If yes, are you sharing the responsibility for study organisation (picking up all the information, handing in exercises in time etc.) ?	Yes No

How are you connected to the net?
a) Via the university, college or other school
b) Remote connection via modem
c) Other ways

	Colleagues, co-students etc.
	Documentation
	Through discussion conferences for the subject
	Other conferences
	Other electronic sources, e.g. on-line help or e-mail

	Not enough time
	They were too difficult to use
	I felt it was a waste of time

	Before work
	In the morning
	During lunch
	After lunch
	After work
	During weekends

	Much more effective
	More effective
	As effective
	Less effective
	Much less effective

	I was able to study at my own pace
	I was able to study when it suited me
	The course contained all the information I needed
	I liked sharing information on the discussion conferences
	The discussion conferences gave me the information I needed

	It was hard to find enough time to participate adequately in the discussion conferences
	I felt isolated
	I needed more support and encouragement in order to be able to learn efficiently
	The course did not include the information I needed
	The discussion conferences did not give me the information I needed