Оригинални научни рад 811.111’232’243

Željka Lj. Babić1 Univerzitet u Banjoj Luci Filološki fakultet

CHANGING THE NATURE OF SPEAKING ASSESSMENT2

The question whether the task of assessing speaking as a separate skill within the scope of language-related courses at English departments is done holistically has rarely been raised in the region. There is a kind of feeling that just by testing what has usually been known as fluency, accuracy, discourse management, or certain lexical and grammatical structures, we are able to give some kind of judgement about the performance itself. By ignoring in practice the necessity of continuous assessment and relying on applying the “one test – one mark” principle, sometimes there is very little attention paid to the personality factors of the examinees. This paper is an illustration of a small project of trying to approach the assessment (and testing) of speaking at universities from another angle – as a collaborative effort of teaching assistants. Key words: validity, reliability, washback, rubrics, collaborative assessment

INTRODUCTION There is almost an unwritten rule about starting investigation of communicative competence by defining what communication is. For the purpose of this paper, we have chosen Morrow’s (1977), in which he gives seven features which characterize communication, and these propose that communication • • • • • •

is interaction-based is unpredictable in both form and message varies according to sociolinguistic discourse context is carried out under performance limitations such as fatigue, memory constraints, and unfavorable environmental conditions always has a purpose (to establish social relations, to express ideas and feelings) involves authentic, as opposed to textbook contrived language

1 [email protected] 2 Rad je predstavljen u vidu usmenog saopštenja pod nazivom: Am I doing it the right way? – assessing speaking na 9. konferenciji ELTA-e: Teaching-Learning-Assessing: Strengthening the Links koja je održana na Filozofskom fakultetu u Novom Sadu, 8. i 9. aprila 2011. godine. Lipar / Journal for Literature, Language, Art and Culture

191

Željka Lj. Babić



is judged to be successful or unsuccessful on the basis of actual outcomes (Morrow, in Rivera 1984: 39).

These features were proposed a long time ago, and therefore changed and adjusted many times up to now. Nevertheless, these have been taken as the starting point for a small pilot project focused on introduction of alternative type of assessment of speaking, for it seemed suitable to test some of the points in a particular real-life situation – in this case, in the English majors’ classroom. Having in mind the fact that the project has been a pilot one and trying to establish whether some new approach(es) towards assessing speaking with students who usually do not want to speak unless being asked questions would actually work, some considerations where put in front of the teachers involved in creation of rubrics for assessment. This research has used self-designed rubrics adapted from Mertler (2001) design retrieved on the following web-page (http://rubistar.4teachers.org/index. php) and added as the Appendix. The rubrics used show that the speech is graded through holistic scoring (Bachman, 1990), even though, with some of the students, objectified scoring may have worked better. Trying to put these features into something which seems useful and applicable to our teaching assistants’ and students’ needs sometimes presents enormous amount of effort, which has to be put into effect with so many appropriate adaptations that it usually means introduction of a whole new approach to assessment. This paper has generally been focused on testing a collaborative approach towards speaking assessment. The preliminary aim was to see whether there would be any progress with students participating in the project, but the main aim was to test the assessment itself through looking at relability, validity and washback. DESCRIPTION OF THE RESEARCH AND THE METHODOLOGY USED Assessing speaking proves to be a challenging job for every teacher, because one has to know how to deal with those who are timid, shy, or just too lazy to talk. Using the fact that it is mandatory for teachers to spend several classes a term at other teachers’ lectures, we have tried to use some of those for assessing speaking, the chosen classes being literary classes. Having in mind the fact that classes are quite large, varying from 20 to 35 students, the focus has been put on assessing students who usually do not want to participate in speaking language classes and for whom literary teaching assistants have reported to have participated in discussions held in their classes quite successfully. The idea has been to observe (and later give scores) for speaking performance of those students, because of the in192

Липар / Часопис за књижевност, језик, уметност и културу

Changing the nature of speaking assessment 

congruent reports about individual performance of some students during informal conversations between members of the teaching staff. The setting of the researched classroom has been as following: students are not aware of being assessed, and they know that the TA is obliged to sit in the classroom, so the tension and the affective filter are put to the lowest level possible, even though one is quite aware of the fact that they cannot be as relaxed as they are in “normal” classes. The literature teaching assistant treats the “visiting teaching assistant” as any other student, so the atmosphere is completely relaxed. During the discussion about a certain poem, literary text or novel, students show their fluency, vocabulary knowledge, grammar, even willingness to participate. After the class, teaching assitants discuss the performances and rate them. This is a time consuming task, and at surface, seems futile. The main focus is on performance which is subjective, both to the teachers’ opinions and to mood the students are in that day. Still, it gives a perspective to each of the assessors what they personally are paying attention to and, maybe, what they are missing. Two people sometimes hear the same thing in two different ways, usually paying attention to different things. The most important thing is that, from the validity point of view, one uses spoken language to measure the stage to which the other forms of exercises have brought the speaker’s performance. Positive sides sometimes can be seen on focusing on language itself in subjects such as literature. We have done some preliminary research which showed that literature teachers paid very little attention to language. They have mainly been focused on students’ rote learning of as many literary concepts as possible, which had (as the result) responses that sometimes were even on the border of being unintelligible, and which they still graded high, focusing only on what they understood had been said. By having this kind of peer assessment (teacher-to-teacher), we have been able to discuss the good and the bad sides of both the approaches and to use literature oral responses for improving speaking skills by administering the same rules to each of the speaking tasks. The students’ grades on language speaking tests have gone up (as well as on literature), because, after some classes dedicated to consciousness-raising and obvious newly-acquired self-confidence and interest, they have been able to perform in a better way. There has been quite a lot of negative energy present from those who gained bad grades on speaking part of the final exam, because it has been quite difficult to explain to them what went wrong with their literature oral exam (which they previously used to pass with flying colors). But, at the end of the school year, the positive effect was seen in that they had to improve their speaking performance in order to pass the exam, even though we should emphasize that the method used was pretty radical. Still, as far as acquisition was concerned, when these two speaking tests were compared, the results were the same. If the grades varied, it was because the student did not prepare well for the literary part. Lipar / Journal for Literature, Language, Art and Culture

193

Željka Lj. Babić

It must be added that, after introduction of this type of assessment, every year there is a number of students who have strong, negative opinion administered to ways of teaching of English literature, for they say more attention is paid to grammar than to literature itself. Even though it is usually the people who are a pace behind the rest of the group who complain, we have felt the need to draw the attention to this issue. CONCEPTS USED IN RESEARCH Returning to the theoretical side which lies beneath this test, we will introduce basic concepts which were used in research of the test usefulness: 1. validity, 2. reliability and 3. washback. Bachman and Palmer (1996) refer to validity as to whether a test measures what it is supposed to measure. Out of different types of validity, the following ones are being emphasized as the issues researched: a) content validity, which measures if the test covers all aspects of what it claims to measure; b) face validity, which measures whether the items in the test seem as realistic, authentic uses of what is being measured; c) construct validity, which measures if all the items seem to be measuring the same thing; d) concurrent validity, which measures if the students score on other measures of the construct such as listening or reading comprehension the same way they do on the test they are using. Reliability is connected with the trustworthiness of the test results. Bachman and Palmer (1996: 19) define it as “consistency in measurement”. One can differ between: a) test/retest reliability, which measures the scoring results of the same test during a short time-frame with no instruction or feedback between testing; b) internal consistency, which is a measure of consistency where by splitting a test into two parts and comparing them, the user is lead to believe that the items are most likely measuring the same thing; c) inter-rater reliability, where two raters evaluating language use should agree with each other in assessment results. 194

Липар / Часопис за књижевност, језик, уметност и културу

 Changing the nature of speaking assessment 

Washback (or backwash) encompasses the effect(s) of testing on teaching and learning, be it(them) positive or negative. Negative effects include teaching only for the test and memorizing possible test questions. If the test is valid, then positive effects include focusing teaching upon what is important. This is, sometimes, a debatable way of approaching the problem (being used, among others, by Hughes 1989). Bachaman and Palmer (1996: 30) insist that the washback should be considered as all the dimensions included into the processing of learning and teaching, thus including educational and societal systems of the students in the scope of the term. DATA DISCUSSION Let us now discuss the qualities of test by describing concepts separately. 1.VALIDITY We can say that concurrent validity is present when this type of assessment is concerned, for final grades went higher on both literature and language tests. The scores on both tests have been alike, and I think one of the reasons for that is the fact that previously the students have been unaware of differences in approaches to speaking. They approached it as if it had been executed in their mother tongue, which sometimes resulted in incoherent set of fragmented utterances. Both literary and language teaching assistants have been focused on certain vocabulary/grammar issues, which are chosen according to the syllabus’ needs, so I would suggest that there is construct validity present up to some point, and what is more important, the presence of positive washback is seen in making students use the knowledge of the acquired forms which have been in the focus. It is quite difficult for me to justify content validity, for it is not always easy to find such texts (especially when dealing with literature from Anglo-Saxon times), so all the aspects needed are definitely not covered (especially vocabulary). Again, it is quite difficult to measure actually what has been taught, so these would also be reasons for my doubts in the presence of content validity. Still, this should not be considered as an excuse for the evident justification on the part of content validity. The main idea behind it has been that it was a pilot project at first and there were many students who had to be covered by the research, so certain modifications to the original postulates of the research had to be made. Teaching assistant have developed their own scale of performance, and the following is rated: presentation, participation, promptness/selfconfidence and vocabulary/grammar. The emphasis has always been put on authentic literary texts, so, in that way, we can say that face validity is Lipar / Journal for Literature, Language, Art and Culture

195

Željka Lj. Babić

present. Still, the language used is sometimes quite difficult, which, with some students, has resulted in negative washback. At this point, one has to mention the issue of inter-rater reliability. Even though there had been an intention to check the scale(s) which teaching assistants use when assessing speaking, this was not done due to frequent changes of teaching assistants working on the specific programme. To be more precise, the most difficult problem has actually been in trying to find and discover reasons for giving particular grades to particular students. Even though professors working on language and literature subjects developed holistic rubrics for grading students, somehow one had the feeling that they were not used as they were supposed to have been done. 2. RELIABILITY As far as inter-reliability is concerned, teaching assistants give their marks according to scale and compare them after giving a mark for each of the parts. The marks are very rarely in disproportion, and I have noticed that it has usually been because of the subjectivity of one of the examiners. Inter-rater reliability was very difficult to achieve, almost impossible, due to frequent changes of teaching assistants who have been teaching both the language and literary courses. My opinion is that internal consistency within of the test is really difficult to justify, because the text used (or topics) are chosen according to syllabus, so the only connection between them is the literary period they come from. Again, if we look at the statistics, there seems to be quite a lot of agreement between the tests administered by different teaching assistants. We have to voice some reservations towards the results, because there was no certain proof that different tests were actually constructed by different people. Retest reliability can also be questioned, because it depends on students’ preparation for classes, even the mood they are in at the moment of the implementation of assessment tasks and their willingness to participate in discussions. The blame for that cannot be always put on students, because I have found myself in numerous situations where I have chosen to avoid posing questions to someone who seemed timid and shy. 3. WASHBACK We can say that positive and negative washback are strongly connected with what has been said above. There are quite strong cases of negative washback present with those students who complain about being assessed and made doing exercises which are not to be marked. They are focused on learning for grades’ sake, not their own. This is quite understandable from my point of view, because a lot of students are put under tremendous pressure to gain as higher 196

Липар / Часопис за књижевност, језик, уметност и културу

 Changing the nature of speaking assessment 

grades as possible, so that they think they would lack time for addressing other important issues when doing these kinds of exercises. Positive washback is seen on improvement in oral presentations and attitudes of students towards language and literature. In this way, students are made aware of how they can connect their language knowledge with the needs of literary oral essay/oral presentation. During speaking, they are encouraged to use vocabulary presented in the classes, so that they gain self-confidence and fluency. By showing them the importance of connecting things, the influence of negative washback is diminished, for certain examples have shown that once you pointed out in which ways the exercises used would help in improvement of their skills, students asked for more such classes. To be honest, the most positive washback seen on the part of students was evident in the fact that the very presence of teaching assistants, who were treated as students in classes, in a way had put quite a lot of students at ease. The presence of someone to whom they usually attach a completely different role seemed to have woken up most of the people who mostly spoke only when they had to. As soon as teaching assistants who attended the classes started being treated as “a part of the group”, the students accepted them as such, and have tried to help them with difficult passages, encouraged them, even answered certain questions for them – the surprising part being the fact that the helpful students were those who usually were silent during classes. Later, teaching assistants stated that if nothing else, these classes helped enormously in building of confidence and trust useful for their own classes. WHERE DO WE GO FROM HERE? It would be very easy, after discussing some of the findings of the pilot research of this specific type of speaking assessment, to draw a conclusion that it was very useful and that such type of assessing speaking had really worked out perfectly. Still, at the beginning, it was clearly stated that such assessment was applied onto a limited number of students. Furthermore, the only decisive point on whether someone would be tested or not was the impression(s) shared by teaching assistants that some students were better, more active, more talkative, more coherent, etc. during the classes where literature had been taught than at language classes directed at speaking. That is probably the reason for such good results. Again, due to lack of time, and the fact that a formal kind of assessment had to be administered in the form of the final test, there is a question open whether all this was necessary. The honest answer from my part is YES. It is easy to justify this answer. There are benefits to both students and teaching assistants. Getting personally involved in creating assessment tools in a way puts teachers closer to needs and circumstances of their students. The Lipar / Journal for Literature, Language, Art and Culture

197

Željka Lj. Babić

gap which exists between them is narrowed immensely and this can be seen as a good starting point towards the common goal – learner-centered assessment. Students benefit because they feel understood and they also see the connection between different courses that they have to take. The usual complaint of “why learning literature in such a way” seems selfexplanatory, because this is a practical proof that everything in English classes should be, and definitely is, connected. There remains a lot of ifs and whens. Still, being a pilot project, the feeling at the end of the research is that a lot can be learnt from these small attempts to emphasize the need to change the face of present-day assessment habits as far as speaking is concerned.

References: Бакман, Палмер 1996: L. F Bachman, A.S. Palmer, Test usefulness: Qualities of language tests. u: L. Bachman and A. Palmer. Language testing in practice. Hong Kong: Oxford University Press, 17-43. Мертлер 2001: C. A. Mertler, Designing scoring rubrics for your classroom. Practical Assessment, Research & Evaluation, 7(25). ‹http://PAREonline.net/getvn. asp?v=7&n=25›. 03.03.2004. Моро 1977: K. E. Morrow, Techniques of evaluation for a notional syllabus. London: University of Reading Centre for Applied Language Studies. Ривера 1984: C. Rivera, Communicative Competence Approaches to Language Proficiency Assessment: Research and Application. Clevedon: Multilingual Matters. Сук 2003: K. H. Sook, L2 Language assessment in the Korean classroom. Asian EFL Journal. ‹http://www.asian-efl-journal.com/dec_03_gl.pdf›. 07. 03. 2004. Хјуз 1989: А. Hughes, Testing for Language Teachers. Cambridge: Cambridge University Press.

Appendix

Oral Performance Rubric: Teacher/Student Conference Teacher Name: _________________________________________ Student Name: _________________________________________

198

Липар / Часопис за књижевност, језик, уметност и културу

 Changing the nature of speaking assessment  CATEGORY

5

4

3

2

1

Presentation

Utterances self-initiated.

Utterances mainly selfinitiated.

Utterances not always self-initiated.

Needs encouragement.

Has difficulties with all tasks.

Participation

Willing to participate, needs no encouragement.

Willing to participate if directly given a certain task.

Willing to participate if asked by a teacher of fellow colleague.

Trying to avoid participation by using short answers.

Needs to be encouraged constantly.

Promptness/ selfconfidence

Absence of hesitation.

Some hesitation present.

Answers most questions with little hesitation.

Noticeable hesitation when answering the questions.

Frequent and long hesitations.

Vocabulary/ Grammar

Adequate vocabulary and grammar, almost native-like.

A few minor errors present Major errors Adequate but they do noticeable vocabulary not influence throughand grammar. communiout. cation.

Score

Major errors which influence communication.

Жељка Бабић / Промјена природе вредновања говора Резиме / Проблем вредновања говора као засебне вјештине представља подручје које отвара многа питања пред наставнике енглеског језика као страног на универзитету. Говор се превасходно сагледава као вјештина неодвојива од остале три интегрисане вјештине, тако да се и њему додјељује улога (како при настави, тако и при самом оцјењивању) равноправна са другим вјештинама, без обзира да ли су оне рецептивне или продуктивне. У овом раду покушало се провјерити да ли ће на основне параметре вредновања, валидност, поузданост и повратну спрегу, утицати промјена у начину предавања говора на часовима енглеског језика, те посебна врста вредновања, колаборативно вредновање, које је реализовано уз помоћ посебно припремљених рубрика. Закључак пилот-истраживања несумњиво показује да овакав начин вредновањаа може помоћи смањењу афективног филтера, а самим тим и побољшању како усвајања структура, тако и њиховој несметаној продукцији. Овакав приступ вредновању може се узети као полазна основа ка приступу коме савремена примијењена лингвистика и тежи – приступу усмјереном на сваког студента понаособ. Кључне ријечи: валидност, поузданост, повратна спрега, рубрике, колаборативно вредновање. Примљен: 15. јула 2013. Прихваћен за штампу јула 2013.

Lipar / Journal for Literature, Language, Art and Culture

199

Lipar_51.191-199.pdf

Sign in. Loading… Page 1. Whoops! There was a problem loading more pages. Retrying... Lipar_51.191-199.pdf. Lipar_51.191-199.pdf. Open. Extract.

129KB Sizes 1 Downloads 141 Views

Recommend Documents

No documents