Testing Speaking: Methods, Techniques and Tips Tim Dalby (Jeonju University) Abstract: Assessing a student’s spoken English ability is something that many teachers do on a regular basis. However, we are not always aware that we may be hindering our students’ ability to communicate effectively. For example, testing a student’s conversational ability through an interviewstyle, teacher-student test has questionable validity. Even if an interview required the same skills as a conversation (which it does not), the unequal power relationship between the teacher and the student will affect the learner’s ability to have a natural conversation. During this short paper, the presenter will discuss various methods of oral assessment, identify the various pitfalls and make recommendations for better quality testing based on current literature and on research carried out at Jeonju University. Keywords: Spoken language, Assessment,

I Introduction Assessing a learner’s ability to use spoken English is something that teachers do on a regular basis. Whether in formal, structured, interview-style tests or simply during everyday classroom discourse activities, there are many ways to evaluate how well a student is able to use the language productively. The results of such assessments hopefully lead to further instruction or are used as a basis for remedial work. Formative assessment, such as this, can allow a teacher to modify the course of study to suit the needs of the learners during the course. This paper will examine many of the different methods of assessing spoken language by first identifying the operations and conditions that are involved in the testing of oral interactions. Here, operations refer to the various skills and activities that learners may be asked to perform, and conditions are the methods by which the test is conducted. The idea of content and construct validity is important to any test and will also be examined here along with ways to measure the quality of the learner’s output. Finally, some

suggestions regarding oral testing in the Korean context will be put forward based on recent research carried out at Jeonju University. It should be stated from the outset that the author does not believe that real spoken interaction can be tested in any way other than through speaking. The construction of abstract sentences using multiple-choice tests is an indirect method of testing that does not reflect real spoken language use. As Weir (1993: 31) clearly notes: ‘To test speaking ability we should require candidates to demonstrate their ability to use language in ways which are characteristic of interactive speech.’

II Operations As stated earlier, operations are the various skills and activities that learners can perform using spoken English. Examples include story telling, giving directions and participating in a meeting. Operations can be broadly divided into two camps: routine skills and improvisation skills (Bygate, 1987; Weir, 1993: 30-34) which will be examined further below. 1. Routine skills Weir divides routine skills into two clear types (1993: 32). The first are called information routines where the learner is expected to convey information by giving descriptions, making comparisons or telling stories. Information routines can be expository, which involves sequencing and identifying a subject, or evaluative, which involves explanations, comparisons and preferences. Assessing routine skills such as these can be done through a presentation or interview style test – something that can allow the learner a long-turn. The second type of routine skills are interaction routines which is where the learner is expected to interact with another person. Examples of interaction routines include telephone conversations, purchasing goods and making decisions. For assessment of these skills, a role-play or interview situation is optimal. 2. Improvisation skills

Routines can be planned, but improvisation is a skill that requires learners to draw on more than just rote memorization of suggested conversational routines. Bygate (1987) describes a range of improvisation skills such as an ability to check meaning, to make corrections and deal with any communication breakdowns. In this respect there are two broad types of improvisation skill: the negotiation of meaning and the management of interaction. Negotiation of meaning is very loosely synonymous with showing understanding. It involves showing friendliness, checking that the other participant has understood correctly as well as responding to requests for clarification. These operations can be built into a test by the examiner pretending to show a lack of understanding or simply by requesting clarification from the learner. Management of interaction is concerned with following the conventions of good conversation such as good turn-taking or managing the topics of conversation.

III Conditions The conditions under which the test is performed will obviously have an effect on the validity of the test and on the ability of the learners to produce spoken language to the best of their ability. Factors include: the amount of time allowed for processing a response (i.e. how much silence will be tolerated); the number of participants and type of interaction; the physical setting; the topic; the complexity of the language and the purpose of the test. In a good test, students will have the opportunity to show what they can do, rather than where they are deficient. Conditions that have an adverse affect on learners should be minimized as much as possible. As Weir (1993: 5) states, ‘In this way, learners might be motivated towards future learning experiences.’ Spratt suggests that it is largely in the hands of teachers and institutions to decide how much of a positive or negative impact an exam will have (2005:23). There is also an argument that if speaking is being tested, then speaking will be taught (Taylor 2005:154).

IV Level of Output

As for any test, determining the criteria for assessment is essential in creating a fair and valid test. To have good content validity, a test must specify which spoken operations are relevant to the test takers and which conditions are likely to affect them. To have good construct validity, there needs to be good empirical evidence that certain operations affect performance and that certain conditions affect performance. This evidence may either be from primary research or from the vast body of assessment literature. Assessing speech reliably is more difficult in spoken testing due to the momentary nature of the interaction. In a writing test, a permanent record is produced of the writing sample, but, unless the interaction is taped, this does not happen for oral testing. So what should be the criteria for oral assessment? When assessing non-native speakers we should not expect a higher standard of performance than from a native speaker – whose speech is littered with self-corrections, false starts, repetitions and circumlocutions (Weir, 1993: 40). Instead, a marking scheme is required along with rater training so that all examiners are clear in the use of the marking criteria. Any good marking criteria should take account of how to assess routine skills, improvisation skills and microlinguistic skills. 1. Assessing routine Skills Factors here include the time constraints available to complete the task, the level of coherence in the discourse and the appropriateness of the production. When considering time constraints, what is a normal amount of time to complete a story or product comparison? How well is the presentation of information organized? Was the candidate able to respond to the social setting of the task using appropriate formality, tone and role relationships? 2. Assessing improvisation skills As suggested earlier, we can assess a learner’s ability to manage the agenda of the conversation by evaluating the level by which the candidate is an active and flexible participant using appropriate turn-taking skills and effective degrees of politeness. We can also view her/his ability to negotiate meaning through effective compensation strategies when difficulties are encountered.

3. Assessing microlinguistic skills For lower level students, an examiner may need to make an assessment of both the accuracy and range of the utterance. Accuracy can include using appropriate grammar and vocabulary, and fluency is concerned with intelligibility. Assessment of these criteria can be analytic (ie, each criterion is marked separately on a scale) or global, as used in the IELTS tests. An example of each is produced below. Each type of scale has advantages and disadvantages – discussion of which is outside the scope of this paper. Table 4.1 – an analytic marking scheme (Weir, 1993: 43-44) Appropriateness 0. Unable to function in the spoken language. 1. Able to operate only in a very limited capacity: responses characterized by sociocultural inappropriateness. 2. Signs of developing attempts at response to role, setting, etc., but misunderstandings may occasionally arise through inappropriateness, particularly of sociocultural convention. 3. Almost no errors in the sociocultural conventions; errors not significant enough to be likely to cause social misunderstandings. Adequacy of Vocabulary for Purpose 0. 1.

2. 3.

Vocabulary inadequate even for the most basic parts of the intended communication. Vocabulary limited to that necessary to express simple elementary needs; inadequacy of vocabulary restricts topics of interaction to the most basic; perhaps frequent lexical inaccuracies and/or excessive repetition. Some misunderstandings may arise through lexical inadequacy and inaccuracy; hesitation and circumlocution are frequent, though there are signs of a developing active vocabulary. Almost no inadequacies or inaccuracies in vocabulary for the task. Only rare circumlocution. Grammatical Accuracy

0. 1. 2. 3.

Unable to function in the spoken language; almost all grammatical patterns inaccurate, except for a few stock phrases. Syntax is fragmented and there are frequent grammatical inadequacies; some patterns may be mastered but speech may be categorized by a telegraphic style and/or confusion of structural elements. Some grammatical inaccuracies; developing control of major patterns, but sometimes unable to sustain coherence in longer utterances. Almost no grammatical inaccuracies; occasional imperfect control of a few patterns. Intelligibility

0. 1. 2. 3.

Severe and constant rhythm, intonation and pronunciation problems cause almost complete unintelligibility. Strong inference from the L1 in rhythm, intonation and pronunciation; understanding is difficult, and achieved only after frequent repetition. Rhythm, intonation and pronunciation require concentrated listening, but only occasional misunderstanding is caused or repetition required. Articulation is reasonably comprehensible to native speakers; there may be a marked ‘foreign accent’ but almost no misunderstanding is caused and repetition required only infrequently.

Fluency 0. 1. 2. 3.

Utterances halting, fragmentary and incoherent. Utterances hesitant and often incomplete except in a few stock remarks and responses. Sentences are, for the most part, disjointed and restricted in length. Signs of developing attempts at using cohesive devices, especially conjunctions. Utterances may still be hesitant, but are gaining in coherence speed and length. Utterances, whilst occasionally hesitant, are characterized by an evenness and flow hindered, very occasionally, by groping, rephrasing and circumlocutions. Intersentential connectors are used effectively as fillers. Relevance and Adequacy of Content

0. 1. 2. 3.

Response irrelevant to the task set; totally inadequate response. Response of limited relevance to the task set; possibly major gaps and/or pointless repetition. Response of limited relevance to the task set, though there may be some gaps or redundancy. Relevant and adequate to the task set. Table 4.2 – a global marking scheme (from: http://www.ielts.org/institutions/test_format_and_results.aspx)

Band

9

Level

Expert User

Description Has fully operational command of the language: appropriate, accurate and fluent with complete understanding. Has fully operational command of the language with only occasional

8

Very Good User

unsystematic inaccuracies and inappropriacies. Misunderstandings may occur in unfamiliar situations. Handles complex detailed argumentation well. Has operational command of the language, though with occasional

7

Good User

inaccuracies, inappropriacies and misunderstandings in some situations. Generally handles complex language well and understands detailed reasoning. Has generally effective command of the language despite some

6

Competent User

inaccuracies, inappropriacies and misunderstandings. Can use and understand fairly complex language, particularly in familiar situations Has partial command of the language, coping with overall meaning in

5

Modest User

most situations, though is likely to make many mistakes. Should be able to handle basic communication in own field.

4

Limited User

3

Extremely Limited User

Basic competence is limited to familiar situations. Has frequent problems in understanding and expression. Is not able to use complex language. Conveys and understands only general meaning in very familiar situations. Frequent breakdowns in communication occur.

Band

Level

Description No real communication is possible except for the most basic information

2

Intermittent User

using isolated words or short formulae in familiar situations and to meet immediate needs. Has great difficulty understanding spoken and written English.

1

Non User

0

Did not attempt the test

Essentially has no ability to use the language beyond possibly a few isolated words. No assessable information provided.

V Testing speaking at Jeonju University Although Korean students spend many years studying English, which starts in grade 3 of primary school, many feel unable to produce good English (Graves 2008:165). In a study at Jeonju University, Kristin Dalby found there were many issues with the methods used by the teachers there when compared to good practice (2009). Upon entering the university, learners take a placement test which places students in a ‘low’, ‘middle’ or ‘high’ level conversation class. Unfortunately, there is no spoken test. Instead there is a reliance on a global assessment of ‘general’ English skills. During the semester, teachers utilize one-to-one interview-style speaking tests for both midterm and final tests (comprising 70% of a student’s grade), which have been popular in oral testing since the 1950’s (Luoma 2004:35). Unfortunately, this format is very unlike regular oral communication (Weir 1993:18) and was heavily criticized in the early 1990s (Alderson and Banerjee 2002:92). Most conversation is between equals where the duties of the conversation such as opening/closing, shifting topics, asking questions, etc. are shared (Kormos 1999:166). However, an interview is usually not between equals. Instead, it is an imbalanced exchange where the interviewee does little else apart from answer questions (Weir 1993:36). In a regular conversation, turn taking is shared and each participant has an equal turn. In an interview the interviewee always takes the longer turn, and the interviewer offers little reaction

(Johnson and Tyler in Alderson and Banerjee 2002:93). As has already been stated, the behavior and gender of the interviewer affects the performance of the test-taker (Taylor and Wigglesworth 2009:328) either positively or negatively (Brown, 2003:2). In Dalby’s 2009 study, less than half of the teachers at Jeonju University interview students in pairs, and only two (out of twenty) teachers have students interacting in groups. Many researchers consider paired or group testing as a good alternative to the one-to-one interview, as students play various speaker roles and fulfill a variety of responsibilities in the conversation, thereby giving the examiner broader evidence about the students’ interactional ability (Luoma 2004:37). More worryingly was the finding that ‘teachers seem unaware of the complexities surrounding the testing of speaking, as 100% of them agreed that they are confident in their ability to orally assess their students’. In her recommendations, Dalby (2009: 10) identified three areas where improvements could be made. The first was to do with teacher-training (or rater training) which would require a certain standardization of both a marking scheme and tasks (which many of the surveyed teachers were opposed to) as well as team testing (rater reliability) of pairs or groups of students much like the Cambridge Speaking Tests (Norton 2005:289). She hopefully suggested that teaming up would also encourage teachers to co-produce tests, as test writing should not be a solitary activity (Weir 1993:19).

VI Conclusion Speaking tests assess speaking ability. Other forms of assessment can only be indirect at best. To ensure that our tests are valid and reliable we need to ensure we are clear about what operations we are testing. Once this is clear, it is possible to create tasks which assess a learner’s ability to perform that operation successfully. Coupled with clear criteria and favorable conditions, it is possible to create an environment in which learners can show their language abilities at their best – and so provide a more accurate measure of a learner’s performance in a real world situation. As Weir (1993: 31) states,

‘The fewer features of the real-life activity we are able to include and the less direct the test, the more difficult it will be to translate performance in the test into statements about what candidates will be able to do with the language.’

References Alderson, J. C. and Banerjee, J. (2002) ‘Language testing and assessment (Part 2)’ Language Teaching 35:2, 79-113. Brown, A. (2003) ‘Interviewer variation and the co-construction of speaking proficiency’ Language Testing 20:1, 1-25. Bygate, M. (1987). Speaking. Oxford: Oxford University Press Dalby, K. (2009). A critical analysis of the approach to the testing of speaking in a Korean university. Unpublished manuscript, University of Leicester, Leicester, UK. Graves, Kathleen (2008) ‘The language curriculum: A social contextual perspective’ Language Teaching 41:2, 147-181. IELTS Band Scores [online] Available from: http://www.ielts.org/institutions/test_format_and_results.aspx [accessed 14/10/09] Kormos, J. (1999) ‘Simulating conversation in oral-proficiency assessment: a conversation analysis of role plays and non-scripted interviews in language exams’ Language Testing 16:2, 163-188. Luoma, S. (2004) Assessing Speaking. Cambridge: Cambridge University Press. Norton, J. (2005) ‘The paired format in the Cambridge Speaking Tests’ ELT Journal 59:4, 287-297. Spratt, M. (2005). ‘Washback and the classroom- the implications for teaching and learning of studies of washback from exams’, Language Teaching Research 9 (1): 5-29 Taylor, L. (2005) ‘Washback and impact’ ELT Journal 59:2, 154-155. Taylor, L. and Wigglesworth, G. (2009) ‘Are two heads better than one? Pair work in L2 assessment contexts’ Language Testing 26:3, 325-339.

Weir, C.J. (1993). Understanding and developing language tests. London: Prentice Hall

Biography: Tim Dalby is the President of the Jeonju-North Jeolla Chapter of KOTESOL. He is a former National 1st Vice-President and was co-chair of the 2009 National Conference. He holds an M.A. in ELT from The University of Reading, UK and has taught in Korea, New Zealand and the Czech Republic in many contexts including teacher training, business English, general English, EAP, FCE, CAE, IELTS, TOEFL and TOEIC. He currently teaches and trains teachers at Jeonju University. Email: [email protected]

Testing Speaking: Methods, Techniques and Tips Tim ...

Tim Dalby (Jeonju University) ... Formative assessment, such as this, can allow a teacher to modify the course ... To have good construct validity, there needs ... rater training so that all examiners are clear in the use of the marking criteria.

163KB Sizes 1 Downloads 158 Views

Recommend Documents

Software Testing Techniques
through actual executions, i.e., with real data and under real (or simulated) ... In this paper, we focus on the technology maturation of testing techniques, including ..... Another interesting research in 1997 used formal architectural description f

FORTEST: Formal Methods and Testing
requires the application of a battery of such techniques. Two of the most ...... ization of existing testing criteria [54] to help eliminate mis- understanding and in the ...

Software Testing Techniques
1. Software Testing Techniques. Technology Maturation and Research Strategies. Lu Luo ... research area within computer science is likely to become even more important in the future. This retrospective on a fifty-year of software testing technique re

Extracting Methods to Simplify Testing
Jun 13, 2007 - When a method is long and complex, it is harder to test. ... can be done for you automatically in Python by the open-source refactoring browser.

Day One: Junos Tips, Techniques, and Templates 2011 - Ludost.net
▫A template is a process you can create and apply to different network scenarios. ▫This book ...... 2011-05-04 18:47:46 UTC: activating '/var/run/db/juniper.data'.