SWIMMING IN WORDS: CORPORA, TRANSLATION, AND LANGUAGE LEARNING Federico Zanettin

0.

Introduction

Translation can be a means to help learners develop reading and writing skills, as well as increasing their cross-cultural and cross-linguistic awareness. Translating consists of interpreting a discourse in the language of the source text and re-interpreting it by creating another discourse in the language of the target text. By recasting discourse A into discourse B, learners manipulate language to a meaningful end, transforming a text originally created to fulfil a communicative function in the language of the source text into another which may have varying degrees of similarity to it according to the function of the target text, going from word-to-word transliteration to restatement. Seen from this perspective, translating between languages is in principle no different from translating from one language variety to another, or from one register to another: all involve a shift in perspective and in recipient design (see Newmark 1988, 1991; Snell-Hornby 1988; Hatim and Mason 1990; Bassnett and Lefevere 1990; Gentzler 1993). Corpora consisting of texts in two languages which are similar in subject and purpose allow not only for contrastive analysis of individual expressions but also provide learners with a mapping of the structures and strategies employed by the two language communities for "building discourse in different linguistic and socio-cultural settings" (Marmadou 1990: 564). In reading a text in the L1 and trying to formulate a suitable "equivalent" in the L2, or viceversa, learners have to strive to find the most appropriate words for the new audience. This is not simply a matter of terminological accuracy, but involves comparing higher-level cultural codes concerning conceptual and rhetorical structures. This paper presents a specific example of the use in a translation task of such "comparable corpora" - two collections of texts, one in L1 another in L2, selected on the basis of a criterion of equivalence and stored on computer. I am distinguishing here between "comparable" and "parallel" corpora, two terms which overlap in much of the literature. The term “parallel corpus” is generally used to designate a collection of texts in language A and of their translations into language B: see for example Leech and Fligelstone (1992), Baker (1995), Marinai et al. (1991). The best known such collection is probably the proceedings of the Canadian Parliament (HANSARD), which are published in both French and English (the original text may be in either language).1 Corpora of this kind are generally aligned on a sentence-by-sentence or phrase-by-phrase basis, either through reference to a bilingual dictionary (Picchi 1991), through statistical elaboration (Langé and Bonnet 1994), or a combination of the two (Johansson et al. 1996), so that instances of any textual string can be retrieved along with its equivalents in the parallel text: such corpora have been extensively used as a basis for the creation of bi- or multilingual terminology databases and thesauri, and for developing machine translation software.2 The term "parallel corpus" has also been used, however, to refer to collections of texts which are not translations of each other, but are selected on the

basis of analogous criteria. These may either be taken from different varieties of the same language (e.g. the various components of the ICE corpus, which are taken from different geographical varieties of English: Greenbaum 1992), or from different languages, for instance collections of laws in French and Danish (Dryber and Tournay 1990), collections of service encounters in British and Italian (Gavioli and Mansfield 1990), or collections of public signs from various English- and German-speaking countries (Snell-Hornby 1984). It is this latter type that I refer to as "comparable" corpora.3 In this paper I discuss the basic operations necessary to create and use small comparable corpora, outlining an experiment conducted with undergraduates to produce an English translation of an Italian newspaper article, and suggest ways in which the procedures involved may contribute to language learning. While in this case the translation was from the learners' native language (Italian) into English, the methodology would also seem appropriate to translation into the mother tongue. The objective was to write a text which would sound as if it had been taken from a British newspaper, with the aid of a corpus of comparable English and Italian newspaper texts and concordancing software. Example 1 shows the original Italian text and the translation of it into English made by one student: while the final product was individually written, much of the research using the corpus involved interaction with other learners. Example 1 In vasca. Sorvegliato speciale e' Matt Biondi, che cerca di vincere l'oro per la terza volta consecutiva ai Giochi, sul gradino piu' alto del podio ben cinque volte nell'edizione '88. Si esibisce nei 50 e 100 stile libero, oltre che nella 4x100 stile libero. Re del mezzofondo e' l'australiano Kieren Perkins, primatista mondiale dei 400, 800 e 1.500 stile libero.

Swimming. Matt Biondi, the defending champion, will be trying to win gold in his third successive Olympic Games. After gaining no less than five gold medals in 1988, this time he is back to contest the 50 and 100m freestyle, and 4x100m freestyle. Kieren Perkins of Australia, the world record holder for the 400m, 800m, and 1,500m freestyle, is top performer over the longer distances.

I will go through the steps followed in making this translation, showing how by contrasting similar formal features in the two corpora which however may differ functionally (false friends, loan words, near synonyms, metaphorical expressions, etc.), and by comparing functionally similar segments of text which may however differ in their formal realisations (rhetorical structures, contextualising information, logical connectors, terminology, etc.), learners can use relatively small comparable corpora for a variety of activities which can not only enhance the specific translation but also allow a wide range of learning to take place. 1.

Making comparable corpora

Some of the most readily available sources of computerised text are newspapers, many of which are now available on the Internet, or commercialised on CD-ROM at an affordable price. A CD-ROM usually contains up to a year of issues (8 to 10 million words of text) from

which selections can be downloaded to the user's hard disk. While not all CD-ROM and online newspaper services use the same search and retrieval software, there is a tendency to standardisation and some basic operations are common to most of them. Any user (teacher or student) who is computer/network literate should be capable of creating collections of text from these sources. The corpora used for the translation activity in question were derived from CDROMs of The Daily Telegraph, The Independent and Il Sole-24 Ore for 1992. The purpose was to create a corpus regarding one event (the '92 Olympics), from one domain (the sports section of these newspapers). The criteria for selecting the texts to be included in a comparable corpus depend on the purpose to which the corpora are to be put. If, for example, the user wants to investigate the use of a high-frequency word which supposedly serves the same function in the two languages, then probably any corpora of roughly similar materials would do. If, on the other hand, the purpose is to investigate how two different cultures treat similar topics within the same domain or genre, then the selection must accordingly target for the same topic, domain or genre. To retrieve articles from a newspaper CD-ROM, it is generally enough to specify keywords (a keyword being a string of characters, which may include wildcard characters such as "*" or "?"). In this case a first search was run using the keywords olympic* and olimp* in the English and Italian data respectively. This however found a number of articles which had little to do with the event itself: the words olympic and olympics also appeared in book reviews, and the adjective olimpico (which also means "calm") in a wide variety of other contexts. Since most search systems of newspaper CD-ROMs allow for queries to be restricted to particular parts of articles (headlines or body), as well as by author, date, and section of the newspaper, the search was rerun specifying a date span (June 1st to September 1st, the period in which the Olympic Games took place), and the sports section of the paper. This yielded 150 articles from The Independent (about 95,000 words), 307 from The Telegraph (160,000 words), and 77 from Il Sole-24 Ore (65,000 words), which were saved as ASCII files. Overall, the English corpus thus consisted of about 250,000 words, while the Italian was roughly a quarter of this size. These figures were sufficiently small to allow learners to become familiar with the texts (100,000 words are the equivalent of 40 pages or so). In teaching applications too much data can confuse the learner, reducing understanding of the relationship between citations and their contexts and at the same time increasing the number of citations to be dealt with (Gavioli, this volume). 2.

Swimming and navigating

The process of querying a corpus through a concordancer may be described as "navigation" - using a metaphor employed by Internet "surfers" - since each citation displayed in a concordance can prompt further searches and lead to the discovery of unexpected features. As it depends on the intuition of the user where to go as a next step (i.e. what to look up next in the corpus), learners have to develop strategies for navigating through the data they are dealing with. Rather than trying to report on particular "routes" which were followed, I shall focus on the kinds of navigational strategies which learners adopted with respect to the translation activity, which was based on one of the articles in the Italian corpus.4 Given that a concordance is a set of different contexts for the same word, an

obvious starting point is to examine the contexts of words in the source text which are likely to be present in both corpora, such as proper names. A concordance of Biondi in the English corpus produced 35 occurrences (see appendix A), many of which were followed by a phrase giving information about the Olympic champion. Learners looked at these citations for possible translations of the first sentence of the source text, which states who Matt Biondi is, and what he is trying to do. These included: Example 2 Biondi expects to be back to his best Biondi is the big man of swimming, Matt Biondi: Swimmer. Won five golds Matt Biondi, the defending Olympic champion, Matt Biondi, the defending champion Matt Biondi, the first man to win seven swimming Biondi, who gained five golds in 1988, Matt Biondi, winner of five gold medals Biondi, with five golds last time MATT BIONDI will try to slip into his `Superman' guise

In the Italian article Matt Biondi is introduced as sorvegliato speciale. This is a phrase that belongs to the language of law, used to refer to a person under police surveillance. Here it is used metaphorically to convey the idea of Biondi, champion of the '88 Olympics, being under attack and defending his supremacy. Thus among the descriptions in the English corpus, "defending champion" seemed a feasible way of translating sorvegliato speciale. The other proper name in the Italian text is that of Kieren Perkins, often referred to as "Australia's Kieren Perkins" or "Kieren Perkins of Australia". This surprised one learner, who had hypothesised using "the Australian Kieren Perkins" in his translation. By generating sample concordances to compare the use of adjectives of nationality, country names as possessives, and of followed by the country name, it was found that the third of these forms was quite the most frequent when referring to contestants in the English corpus. This form was therefore selected as a translation. After proper names, a second strategy was to look for similar expressions and/or classes of expressions in the two corpora. Work with concordancing software favours an approach which starts from a relatively low level of text constituency - the behaviour of words (Brodine, this volume; Baker 1992). What the learner is typically looking for is something of the kind "how do you say this in English?" - the equivalent of a key word. For instance, in the first sentence of the Italian text, two more things are said about Biondi: che cerca di vincere l'oro per la terza volta consecutiva ai Giochi (lit. "who is trying to win the gold for the third consecutive time at the Games"), and sul gradino più alto del podio ben cinque volte nell'edizione '88 (lit. "on the highest step of the podium no less than five times in the '88 edition"). Concordances were therefore generated for the presumed English equivalents of the key words oro (gold), podio (podium), and consecutiva (consecutive). A concordance of gold* produced nearly 850 lines. Sorting these by the words to the left/right and skimming through them, a number of patterns were noticed, for instance that one can "win/gain/earn/get the/a gold (medal)", or "win golds". By also generating a concordance of or* in the Italian corpus (109 lines) these expressions could be analysed

contrastively. In Italian you can say "vincere/conquistare/prendere la/una medaglia d'oro" (lit. win/conquer/take the/a medal of gold), "vincere/conquistare/prendere un oro" (lit. win/conquer/take a gold), or "vincere/conquistare/prendere ori/medaglie d'oro" (lit. win/conquer/take golds/medals of gold). It was noted, however, that while in English you can "win gold" (ex. 3), Italian requires a definite article: "vincere l'oro" (ex. 4): Example 3 (gold with win* or won immediately to the left, sorted by the word to the left of gold: every third citation) 1 2 3 4 5 6 7 8 9 10 11 12 13

way ahead. I badly wanted to win gold but I accept I probably won't now.' F de open,' he said. 'Anyone can win gold.' Robb's Liverpool Harriers team-mate run she destroyed the field to win gold with one of the greatest track perform y, they sailed brilliantly to win gold. Windsurfers Barrie Edgington, 25, and ke status in Turkey after winning gold in the 60kg in Seoul in 1988, had bee heir tally, Romas Ubartas winning gold in the discus for Lithuania. When Atl ed strong men to tears by winning gold in Munich aged 33. Brasher (3,000m ste onze, and Ann Brightwell, who won gold and silver. So you can imagine how I f and, for a long time, better won gold; when he was collecting bronze medals here else - East German women won gold and silver in all events at the 1986 0m and 400m medley while Hong won gold in the 100m butterfly. BRITISH SWIMM Mike McIntyre and Bryn Vaile won gold medals in South Korea four years ago, Pattison, a naval officer, won gold in the Flying Dutchman class in 1968 a

Example 4 (or* with vinc* within two words to the left, sorted by the second word to the left: all citations) 1 2 3 4 5 6

ct Velasco _ non solo non si vince l'oro, ma non si arriva alla finale>. Nella ablo Morales, un ragazzo che vince l'oro nei 100 farfalla a 28 anni (per il nu Matt Biondi, che cerca di vincere l'oro per la terza volta consecutiva ai Gio di ridicolo. Due bulgari vincitori dell'oro erano risultati positivi agli ster to, io prendo un sabbatico e vinco l'oro>. Spitz, forse il piu' grande nuotato ", Maurizio Damilano, che ha vinto l'oro nella 20 chilometri di marcia addirit

Learners rapidly discovered that the next two key words in the first sentence of the source text, podio and consecutiva, had cognates in the English corpus, podium and consecutive. The question was: if there is a cognate form in English, is it a true or a false friend (Holmes and Guerra Ramos 1993; Partington 1995)? As can be seen from the following concordances (podium* in the English texts: ex. 5; podi* in the Italian texts: ex. 6) the sense of podium does in fact correspond to that of podio in this context: Example 5 (every third citation) 1 2 3 4 5 6 7

ay Michael Carruth was standing on a podium in the boxing arena here, listening t finish anywhere better than on the podium of an Olympic Games.' Ever since his win. But I was proud to stand on the podium after a race like that. 'It's a gr me away from a definite place on the podium after crushing Australia 98-65 in t two Americans stood on the winner's podium to salute the anthem. True that duo .60m. Despite his climbing on to the podium along with Zelezny and Raty, there w as Skah stepped jauntily out to the podium matched that which accompanied his

Example 6 (every third citation) 1 2 3 4 5 6 7 8 9

e Abebe), conosceva la sua ascesa al podio (terzo posto) dopo un quarto d'ora qu spettacolare autorita' la scalata al podio piu' alto del torneo a squadre costri assullo e Bomprezzi sono lontani dal podio. Nella vela un avvio in sordina dopo avoriti per il gradino piu' alto del podio Scarpa e Josefa Idem; outsider Rossiera mista) sul gradino piu' alto del podio, la cinesina Zhang Shang. La seconda, tiste azzurre che hanno sottratto il podio piu' alto alle tedesche. Da sinistra: 6 e 10,8 milioni. Il terzo posto sul podio equivale, quindi, a una vittoria in u empio, se Michael Jordan salira' sul podio alla cerimonia di premiazione del tor i di esilio, il Sud Africa torna sul podio: i tennisti Ferreira e Norval si sono

It was noticed, though, that there was a difference in the relative frequency with which the two terms occurred. There were 27 occurrences of podio in the Italian as opposed to only 22 of podium in the English corpus, even though the latter was four times as large. Inspecting the concordances highlighted that the expression in the source text, il gradino più alto del podio ("the highest step of the podium") is repeatedly used to mean "winning the gold medal" (6 occurrences: cf lines 4-5 in ex. 6), and that this figurative use did not occur in the English corpus. Consequently, the word podium was not used to translate this expression, with learners resorting instead to the less figurative but more adequately attested gaining five gold medals (ex. 7): Example 7 (gain* with gold? within five words to the left, sorted by the word to the left of gain*: all citations) 1 2 3 4 5 6

king their first Olympic appearance, gained the women's gold medal for hockey a omplete their convincing progress by gaining gold medals, but at least they are olwill on how Spain and Germany made gains in the gold market in the hockey tou lifting three times his body weight, gaining the gold medal for Turkey and then d be in a week's time. Biondi, who gained five golds in 1988, will again be th ll wrong.' Not so Tamas Darnyi, who gained Hungary's second swimming gold medal

The other cognate noted, consecutive, also posed the question of whether it was a false friend. One learner, who suspected that "successive" might be a better translation, compared the two Italian and the two English words (consecutivo/successivo vs consecutive/successive). The concordances showed that the English words were almost always preceded by a number, and seemed to be synonyms (ex. 8), while the two Italian words differed. Consecutivo (the citation form is the masculine singular) seems more or less equivalent to the English successive/consecutive, while successivo means something like "following/next", appearing two out of four times in the phrase gli anni successivi (lit. "the following years": ex. 9). (This led this student to also investigate the behaviour of following and next.) Example 8 (every fifth citation) 1 2 3

beat him after a run of 10 consecutive defeats at the distance aroused an unmi se we never knew existed. At successive Olympics, World Championships and Common l of the same event in four successive Olympics? 13 Who came second behind Seb

4 5 6 7 8 9 10

rst time in 1986 and then in consecutive years, 1990 and 1991, but has had to w a final total for the second consecutive year. Twelve months after breaking his be if he is to win his third consecutive Olympic gold medal. He and his partner, emselves seeking their third consecutive Olympic gold in the coxed pairs today. owing Steve Redgrave's third successive rowing gold on Saturday, Johnny, 23, and xercise where she does three consecutive back-flips, in which her hands never to in 1920, has collected three successive gold medals. Redgrave bridged 72 years o

Example 9 (every third citation) 1 2 3 4 5

rl Lewis, per la terza volta consecutiva campione d'Olimpia, l'ha vinta al pri ultimo ostacolo. Per 12 gare consecutive e dieci anni, fra il 197 e il 1982, ano la prua al mito, tre ori consecutivi alle Olimpiadi insieme al timoniere D in termini di monetizzazione successiva della medaglia d'oro, sono piu' lontan difiche apportate negli anni successivi, riusci' a rendere competitivo.
Corpora of homogeneous texts allow not only comparison of the uses of individual words, but also of features of discourse structure and their realisation (Aston 1996). To translate the Italian In vasca, literally meaning "In the swimming pool", the strategy adopted was to examine its function as a headline which indicates that this news item is about swimming. Learners looked for expressions in the English corpus which might fulfil this function. A concordance of swim* quickly revealed that an equivalent headline appeared to be Swimming, 14 out of 102 instances of this form being found in headlines.5 Even where learners were relatively confident in proposing a translation, the corpus often surprised them in unexpected manners. Rather than on terminological accuracy or functional appropriacy, this often involved focusing on style: is a certain phrase "native-like", would it be used in a British newspaper? In the second sentence of the source text, the expression 50 e 100 stile libero intuitively had a straightforward equivalent in "50 and 100 freestyle". With the corpus at their disposal, some students tried looking for numbers by typing *0 as a keyword, running the search on both the Italian and the English data. As this resulted in hundreds of citations, the search had to be narrowed by adding other characters to the string, looking for words matching the strings ?00 and *00m. This highlighted the fact that in the English corpus, race distances were expressed specifying the unit of length (m or metres), in such patterns as 100 metres freestyle or 100m freestyle, while in Italian they also appeared as 100 stile libero. The only exception was in coordinate constructions ("50 and 100 metres freestyle"). Some discoveries were purely casual. While examining the data related to numbers, one student happened to notice the following citation (ex. 10): Example 10 Games, is back to contest the 100 metres, but Carl Lewis, the man who inhe

of which he proceeded to view the enlarged context (ex. 11): Example 11

Ben Johnson, banned for two years for drug abuse after the Seoul Games, is back to contest the 100 metres, but Carl Lewis, the man who inherited his gold medal, will not be there, or in the 200m, following his failure in the trials. In the 110m hurdles the world champion, Greg Foster, and world record holder, Roger Kingdom, also failed to come through the trials, as did Antonio Pettigrew, the world 400m champion, and Dan O'Brien, the world decathlon champion and subject of a massive pre-Olympic publicity campaign.

This is not an article about swimming, but nonetheless contained a couple of expressions which seemed particularly appropriate solutions to the problems of translating si esibisce (literally "exhibits him/herself") and primatista mondiale (literally "world record holder") in the source text, insofar as neither of these terms would seem to refer exclusively to swimming. Translating si esibisce had proved problematic, with searches being run in the Italian corpus for strings corresponding to synonyms such as gareggiare and disputare, and in the English corpus for cognates and functionally equivalent words to these, such as dispute and perform. However, a search in both corpora for perform* showed that out of 219 occurences, only 21 were verb forms, the rest mainly being the noun performance (17 of these in the compound performanceenhancing drugs/substances, and 5 as a borrowed term in the Italian texts). Example 12 lists five typical examples: Example 12 His magnificent performance at Tokyo attracted a warm tribute ... ANOTHER masterly performance by Steve Redgrave and Matthew Pin ... McKean's performance was reminiscent of other runs he ... ... Carruth's gold medal-winning performance from the ringside. Moorhouse's performance tomorrow, and that of his team ...

Rather than attempting to rearrange the text to use this nominal form, this student decided to use the phrase "[he] is back to contest", which in example 11 refers to the runner Ben Johnson, but could easily be applied to the swimmer Matt Biondi. Casual discoveries were worth checking systematically, however. In the same paragraph on Johnson there also appeared the expression world record holder, which seemed the equivalent of primatista mondiale. Checking this equivalence by examining concordances for these expressions confirmed that they were used in similar contexts, but also showed that world record holder is usually preceded by the article (ex. 13): Example 13 (world record holder, sorted by the first word to the left: alternate citations) 1 2 3 4 5 6 7 8 9 10

ound Said Aouita, the 1,500m world record holder prevented by injury from compe w a double world champion and world record holder. There is also a men's 200m fr Stewart, the 200m butterfly world record holder, said. Mike Barrowman, who hol the Olympic title. The former world record holder said a combination of the heat ved Leroy Burrell, the former world record holder who ran the second leg in the year because the 400 metres world record holder had competed while banned for the fact that the 400 metres world record holder had failed a drugs test. Confu m backstroke. Jeff Rouse, the world record holder from Petersburg, set the early be his year but he wanted the world record holder in the race to prove he could European champion, three times world record holder By MIKE ROWBOTTO

In this respect it differed from Italian. To sum up, the Olympics corpora allowed learners to contrast the source and target languages at various levels, from single words and phrases to discourse functions and organisations. By looking for patterns and regularities in the English and Italian texts and comparing the uses of words and expressions which they felt had some kind of relationship either within or across languages, they were able to find evidence to formulate and to support specific translation hypotheses. As a final step, they were invited to check how far the patterns they had observed in English might be generalisable to other contexts by comparing concordances from the Olympics corpus with data from a more general newspaper corpus (MicroConcord A: Scott and Johns 1993). 3.

Learning to create meaning

Like any other process of discourse construction, translation involves creating meaning, the difference being that translation is "guided creation of meaning" (Halliday 1993: 15). Using comparable corpora in this process increases the available guidance, insofar as it means building up the text using ready-made chunks of language, which have been used in similar contexts on similar occasions, selecting those judged most appropriate to convey the desired meaning. This does not mean, however, that translating simply becomes a "cut and paste" activity. The various pieces found will rarely fit together exactly, but have to be adjusted and linked in order to create a new text which is more than just a patchwork of pieces stolen from elsewhere. For instance, in the first sentence of the target language text in example 1 above ("Matt Biondi, the defending champion, will be trying to win gold in his third successive Olympic Games"), only the nominal group "Matt Biondi, the defending champion" was pasted from the English corpus without modification. The rest of the sentence combined bits and pieces adapted from three different concordance lines: ... a third consecutive gold medal ... (ex. 5, line 1) ... won his third consecutive gold medal ... (ex. 5, line 7) ... to win gold in three successive Olympic Games ... (ex. 5, line 26) None of these exactly matches the wording finally adopted. The learner not only has to select relevant chunks of language according to their perceived meaning, but also to shape them together according to the desired overall outcome. Instead of dealing, as is typically the case in translation activities, with individual lexical items and their grammatical combination, the learner works with contextualised multiword units, their adjustment and integration. This process arguably parallels that engaged in by fluent writers, who must operate with multiword units in order to "perform at a level acceptable to native users" (Cowie 1992: 10; see also Aston, this volume). Adjusting and integrating such units may enable the learner to develop skills of accommodation in the light of the purposes of the discourse, and of monitoring of the outcome with respect to stylistic criteria. For instance, whereas in the source text the information that Biondi had won five golds in the 1988 Olympics is found in the first sentence, in the target text this information was moved to the second sentence in order to avoid using the word "gold"

twice in the same sentence. These operations involve the creation of a coherent and cohesive text above the sentence level - a fundamental skill of writing as well as of translation. For instance, the source text provides two pieces of news, concerning the top performers in the short and middle distance races respectively, Matt Biondi and Kieren Perkins. The latter is said to be re del mezzofondo (literally, "king of the middle distance"). But while mezzofondo can function as a head noun in Italian ("middle distance race"), a concordance for mid* in the English corpus reveals that middle-distance (16 occurrences) is always used as a qualifier (e.g. middle-distance runner). The target text solves the problem by using an appositional structure ("over the longer distances") which parallels that of the first sentence about Biondi: besides being a phrase present in the English corpus, as shown by a concordance for distance*, it functions as a cohesive link. In this sense, the nonattestedness in the corpus of "middle distance" as a head noun functioned as a constraint obliging this learner to focus on the coherence of the target text as a whole, rather than simply as a series of sentences constructed from equivalent words and phrases. 4.

Learning about the language and the culture

This very brief example of a translation activity suggests a variety of potential learning benefits which may derive from the use of comparable corpora. We have seen, for instance, that it may enhance awareness of the relationships between words which are possible translation equivalents in the two languages. Hypotheses as to equivalences may be derived from intuition, or from bilingual dictionaries, which can act as an interface between the languages concerned. But whereas dictionaries contain "predigested information" (Fontanelle 1994), comparable corpora of the kind presented here are simply collections of raw data: the words are not defined, but given meaning by their contexts. By supplying meaningful instances of real language in use whose full context is always available at the touch of a key, concordances offer the learner both greater safety of numbers and greater certainty of contextual appropriacy than do dictionary examples. The argument concerns grammar as well as lexis: while one feature which seems particularly difficult for learners to master from dictionaries or grammars is article use, the corpora provided clear evidence as to the use of determiners with the phrases "win gold", "world record holder" and their italian equivalents. They similarly highlighted regularities in the use of proper names, showing the preferred constructions used to refer to a country of origin (cf 2 above; see also Bertaccini, this volume). Lastly, corpora also constitute a source of extralinguistic (world) knowledge, by providing information about people, places and institutions. The specific information drawn from these small comparable corpora must not of course be treated as generalised "facts". The corpora do not claim to be representative of any wider category of texts, and features typical of them should not be interpreted as necessarily typical of the language as a whole, or even of sports journalism. They are reliable to the extent that specific features are attested with a certain frequency, in a set of texts which are credibly analogous. While certain findings, such as the preference in English for nominal constructions with performance over the use of the verb perform, may have a wider bearing, such hypotheses must always be checked prior to

generalisation, and the learner needs to be wary of the influence which the specific composition of the corpus exerts on the meaning and function of items of all kinds (see Zorzi, this volume). It is important to be careful about what may well be domain-specific words, senses and distributions. For instance, the word golds occurs over 50 times in the English Olympics corpus. Yet gold is usually classed as an uncountable noun in EFL textbooks (see e.g. Fowler et al. 1983: 71), and there are no examples of golds at all in the much larger and more varied MicroConcord A newspaper corpus. This suggests that the plural golds may only have the specific sense of "gold medals" in refernce to sporting events. In contrast, both gold and golden occur as adjectives in the Olympics corpus, golden being sometimes used where in Italian we find d'oro (as in ragazza d'oro: "golden girl"). The latter example suggests that whereas gold indicates the metal, golden indicates the metaphorical quality, a hypothesis which is in fact confirmed by a search for these words in the MicroConcord A corpus. While not immediately useful for the translation, where it was clear from the start that the English for medaglia d'oro was "gold medal", this finding exemplifies the way analyses of comparable corpora can throw up hypotheses which may be noted and investigated as learning spin-offs for future use. The findings of a search for an appropriate translation of re in re del mezzofondo (literally, "king of the middle distance") were also not used in constructing the TL text, but they again illustrate how the process of using the corpora can lead to unforeseen learning. In the English corpus, king was almost exclusively used to refer to Juan Carlos, the king of Spain. The sole metaphorical instance referred to an activity rather than an individual: Basketball is the king in Lithuania and they were hoping to use the sport to strike a blow against the Commonwealth of Independent States - which they still regard as a symbol of the old Soviet regime.

Another possible translation of re was big man, a phrase found referring to Matt Biondi (see ex. 2 above). It was rejected for the reason that this use was not only metaphorical but also literal ("Biondi is the big man of swimming, standing 6ft 7in tall and the winner of six Olympic gold medals"). A further candidate was top performer, which had been noticed during another search. On searching for top, two occurrences of top dog were also found. However, both of these were in quoted statements, suggesting that this expression might be more typical of spoken registers, and top performer was the form eventually chosen. But as much if not more was probably learned from investigating hypotheses that were rejected than from examining those which were finally adopted in the target text. 5.

Conclusions

Using comparable corpora and concordancing software for translation activities can help learners gain insights into the languages and the cultures involved and to develop their reading and writing skills. By its very nature, translation is an activity in which the negotiation of meaning is mediated through the written medium, where the learner interacts first as a reader with the producer of the source text and then as a writer with the recipient of the target text. While the interaction which takes place between learner,

text and computer is in primis an individual activity, it does not exclude extension to an interactive classroom setting. Duff (1989) argues that many translation activities can form a basis for group work and oral discussion, in which learners engage in a meaningful negotiation of meaning and exchange of their own contributions. Comparable corpora can also be a springboard for use in these activities. Besides the reward of a more natural-sounding final text - Appendix B gives examples of other translations of the same source text carried out without the aid of the corpus - I have tried to show that much else can be achieved through their use.

Notes 1.

Other aligned parallel corpora include those being developed from EC official journals and telecommunication texts (McEnery and Wilson 1994), from the multilingual technical manuals of computer companies like IBM or Microsoft, as well as from more heterogeneous sources (Johansson et al. 1996).

2.

Whereas machine translation (MT) issues are not within the scope of this paper, it may be useful to point out briefly possible areas of overlap, insofar as much of the work carried out with parallel and comparable corpora has been intended to have applications in this field. Wills (1993) distinguishes between four different procedures which go under the heading of MT: (a) word-for-word substitution, (b) machine-aided human translation (MAHT), (c) human-aided machine translation (HAMT) and (d) fully automatic machine translation (FAMT). The first of these is simply “a form of the interlinear version known from the Middle Ages” (Wills 1993: 405) and is generally of little use; MAHT consists essentially in a word processor equipped with the capability of interfacing with dictionaries and terminological data banks, which “may contain [...] a device to specify certain words in certain contextual environments”; HAMT, which requires human intervention in either or both the editing and the post-editing phase, is the area where most commercial software has been developed, ranging from professional tools such as IBM Translation Manager/2, Trados Translator's Workbench or Globalink to less sophisticated programs like Microtac Language Assistants. HAMT is also the main field of application for “parallel corpora” of the HANSARD type, which has been used as a testbed for IBM’s TM/2 (Langé and Bonnet 1994; Somers 1993); FAMT, which requires a fully implemented formalisation of implicit knowledge, does not seem to be a real option, except perhaps in the case of very narrowly defined and formulaic linguistic domains, e.g. the METEO translation software for translating meteorological bulletins (Lewis 1992). Many of the techniques and procedures from MAHT and HAMT involve the use of parallel or comparable corpora, in ways that are relevant to machine-aided language learning and to the training of translators.

3.

Laffling (1992) calls this category "globally parallel corpora", while Baker (1995) refers to those involving different languages as "multilingual corpora", giving as an example the Council of Europe Multilingual Lexicography Project (COMLP), which has collected analogous texts in seven European languages. Baker (1993, 1995, 1996) also proposes a different use of the term "comparable corpora" from that adopted here, when she advocates the setting up of what she terms, "for lack of a better term", “comparable corpora”, consisting of a corpus of texts originally written in language A and another corpus of texts translated into language A from different source language(s), covering "a similar domain, variety of language and time span”. The study of such corpora, she argues, may cast light on the translation process. For further discussion of these terminological issues, see also Zanettin (1994); Picchi and Peters (1996).

4.

While in this case the text to be translated was selected from a pre-defined corpus, it is of course equally feasible to construct a corpus to aid translation of a particular preselected text or set of texts. In this case, care should be taken to choose keywords which will select the appropriate text-type. In designing a corpus to translate the Matt Biondi article, for example, one might be tempted to select all articles about swimming from newspaper CD-ROMs, using swim*/swam and nuot* as keywords. Such a procedure would however retrieve too many articles which have little to do with swimming as a sport, and ignore many from the relevant domain of sports newspaper articles having to deal with the Barcelona 1992 Olympics.

5.

This convention was confirmed by a search run on the MicroConcord A corpus (Scott and Johns 1993), which contains about one million words taken from all sections of The Independent. Here, "Swimming" appeared as a headline in 3 citations out of 9.

References Aston, G. 1996. "Enriching the learning environment". In A. Wichmann, S. Fligelstone, T. McEnery and G. Knowles (eds), Corpora and language teaching. London: Longman. Baker, M. 1992. In other words. London: Routledge. Baker, M. 1993. "Corpus linguistics and translation studies". In Baker, M., G. Francis and E. Tognini Bonelli (eds), Text and technology: in honour of John Sinclair. Amsterdam: Benjamin. 233-252. Baker, M. 1995. "Corpora in translation studies: an overview and some suggestions for future research". Target, 7. 223-243. Baker, M. 1996. "Corpus-based translation studies - the challanges that lie ahead". Paper presented at Unity in Diversity? International translation studies conference, Dublin City University, 9-11 May 1996. Bassnett, S. and A. Lefevere (eds) 1990. Translation, history and culture. London: Pinter. Butler, C. (ed) 1992. Computers and written texts. Oxford: Blackwell. Cowie, A.P. 1992. "Multiword lexical units and communicative language teaching". In P.J.L. Arnaud and H. Bejoint (eds), Vocabulary and applied linguistics. London: Macmillan. 1-12. Dryberg, G. and J. Tournay 1990. "Définition des équivalents de traduction de termes économiques et juridiques sur la base de textes parallèles". Cahiers de lexicologie, 56-57. 261-274. Duff, A. 1989. Translation. Oxford: Oxford University Press. Fontanelle, T. 1994. "Towards the construction of a collocational database for translation students". Meta, 39/1. 47-58. Fowler, W.S., J. Pidcock, R. Rycroft and G. Del Giudice 1983. Sprint: a complete English programme. London: Nelson. Gavioli, L. and G. Mansfield (eds) 1990. The PIXI corpora: bookshop encounters in English and Italian. Bologna: Cooperativa Libraria Universitaria Editrice. Gentzler, E. 1993. Contemporary translation theories. London: Routledge. Greenbaum, S. 1992. "A new corpus of English: ICE". In J. Svartvik (ed.), Directions in corpus linguistics. Mouton: De Gruyter. 171-179. Halliday, M.A.K. 1992. "Language theory and translation practice". Rivista internazionale di tecnica della traduzione, 0. 15-26. Hatim, B. and I. Mason 1990. Discourse and the translator. London: Longman. Holmes, J. and R. Guerra Ramos 1993. "False friends and reckless guessers: observing cognate recognition strategies". In T. Huckin and J. Coady (eds), Second language reading and vocabulary acquisition. Norwood, NJ: Ablex. 86-108. Johansson, S., J. Ebeling and K. Hofland 1996. "Coding and aligning the EnglishNorwegian parallel corpus". In K. Aijmer, B. Altenberg and M. Johansson (eds), Languages in contrast. Lund: Lund University Press. 87-112. Laffling, J. 1992. "On constructing a transfer dictionary for man and machine". Target, 4. 17-31. Langé and Bonnet, 1994. "The multiple uses of parallel corpora". Paper presented at the 1st International Conference on Teaching and Language Corpora(TALC), Lancaster University, 10-13 April 1994.

Leech, G. and S. Fligelstone 1992. "Computers and corpus analysis". In Butler. 115-140. Lewis, D. 1992. "Computers and translation". In Butler. 75-113. Marinai, E., C. Peters and E. Picchi 1991. "Bilingual reference corpora: a system for parallel text retrieval". Using Corpora: Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research. St. Catherine 's College: Oxford. 63-70. Marmadou, S. 1990. "Contrastive analysis at the discourse level and the communicative teaching of languages". In J. Fisiak (ed), Further insights into contrastive linguistic analysis. Amsterdam: Benjamin. 561-571. McEnery, A. and A. Wilson 1994. "Corpora and translation: uses and future prospects. In M.A. Lorgnet (ed), Atti della fiera internazionale della traduzione II. Bologna: Cooperativa Libraria Universitaria Editrice. 311-343. Newmark, P. 1988. A textbook of translation. London: Prentice-Hall. Newmark, P. 1991. About translation. Clevedon: Multilingual Matters. Partington, A. 1995. "`True friends are hard to find': a machine-assisted investigation of false, true and just plain unreliable `friends'". Perspective Studies in Translatology. 95:1. 99-112. Peters, C. and E. Picchi. "Bilingual reference corpora for translators and translation studies". Paper presented at Unity in Diversity? International translation studies conference, Dublin City University, 9-11 May 1996. Picchi, E. 1991. "DBT: a textual database system". In L. Cignoni and C. Peters (eds), Computational lexicology and lexicography. Linguistica computazionale, 7. 177-205. Scott, M. and T. Johns 1993. MicroConcord ver. 1.0. Oxford: Oxford University Press. Scott, M. and T. Johns 1993. MicroConcord - Corpus A: The Independent and The Independent on Sunday. Oxford: Oxford University Press. 93. Snell-Hornby, M. 1984. "The linguistic structure of public directives in German and English". Multilingua, 4. 203-211. Snell-Hornby, M. 1988. Translation studies: an integrated approach. Amsterdam: Benjamin. Somers, H.L. 1993. "Current research in machine translation". Machine translation, 7. 231246. Wills, W. 1993. "Basic concepts of MT". Meta, 38. 403-413. Zanettin, F. 1994. "Parallel words: designing a bilingual database for translation activities". In A. Wilson and T. McEnery (eds), Corpora in language education and research: a selection of papers from Talc 94. UCREL technical papers, 4. Lancaster: UCREL. 99-111.

Appendix A 1 in the centre lane. On his right was Biondi, 26, with Jager, 27, on his left. Th 2 s showed no respect for reputations. Biondi, 26, made a brave attempt to add to 3 force his way into the record books. Biondi, a giant of a man in both stature 4 1 sec. Foster was sixth in 22.52 and Biondi, a five-time winner in Seoul, missed 5 rged from the dive just behind Matt Biondi and Alexander Popov it did not augur 6 for Spain it was disappointment for Biondi and Evans in the 100m and 400m frees 7 s the United States favourites, Matt Biondi and Janet Evans, failed to retain th 8 ludes the outstanding talent of Matt Biondi and Tom Jager, of the United States. 9 ast night, which, on a day when Matt Biondi and Janet Evans were racing, was a s 10 lute the anthem. True that duo, Matt Biondi and Janet Evans, heard it often eno 11 st Borges was given the same time as Biondi but after the officials looked at th 12 ming golds were awarded to the pair, Biondi collecting five. Yesterday Evans ha 13 and my shoulders get cramped up.' Biondi expects to be back to his best by th 14 00m and 400m freestyle respectively. Biondi is the big man of swimming, standing 15 50-metre freestyle, and two relays. Biondi is hoping that a combination of less 16 son, Sergei Bubka, Mike Powell, Matt Biondi, Li Jing, Michael Jordan, too. Wha 17 DTL 30 JUL 92 / Olympics '92: Biondi must conquer the threat of Popov - S 18 reestyle title, writes Colin Gibson. Biondi (pictured), 27 the day after the Gam 19 for a run in the morning. Perversely Biondi's recent dip in form - he was third 20 and a bronze in Seoul MATT BIONDI returns to the Olympic fray in Barce 21 e of a factor of working too hard,' Biondi said. 'Fortunately, we're over the a 22 Foster (50m freestyle) might ruffle Biondi's supremacy. The women's chances ar 23 tsteps over the next fortnight? Matt Biondi: Swimmer. Won five golds, a silver 24 sprint freestyle relay to make Matt Biondi the first male to win seven swimming 25 tenders are the American duo of Matt Biondi, the defending Olympic champion, an 26 ailed to qualify for the final. Matt Biondi, the defending champion, and Tom Jag 27 too uspset to speak and walked out. Biondi's vulnerability was first hinted at 28 he was crying in the practice pool. Biondi, who was fifth, said: 'The prospect 29 ut they could be in a week's time. Biondi, who gained five golds in 1988, will 30 By COLIN GIBSON MATT BIONDI will try to slip into his 'Superman' 31 Alexander Popov. The main threat to Biondi will again be Popov and Tom Jager, w 32 der. Later in the week American Matt Biondi will be attempting to force his way 33 will face tough opposition from Matt Biondi, winner of five gold medals, includi 34 g this time around - but 26-year-old Biondi, with five golds last time and one 35 e point it looked very unlikely that Biondi would return as an individual compe

Appendix B The two translations of the source text below were respectively carried out (a) by a professional translator and native speaker of English without the help of reference instruments; (b) by an Italian learner of English using traditional reference tools. A.

At the pool all eyes are on Matt Biondi who is trying to win the Gold for the third time running at the Games, having won no less than 5 times in 1988. He swims in the 50 and 100 metres freestyle, and in the 4 by 100 freestyle. King of the middle distances is the Australian Kieren Perkins, world record holder in the 400, 800, and 1500 metres freestyle.

B.

In the pool nearly unbeatable, Matt Biondi, who won 5 gold medals in the 1988 edition, tries to win his third gold medal in a row at the Games. Besides the 4 X 100 frestyle relay he will compete in the 50-m and 100-m freestyle. King of middle-distance races is Australian Kieren Perkins, world-record holder at 400-m, 800-m and 1,500-m freestyle.

SWIMMING IN WORDS

E. Tognini Bonelli (eds), Text and technology: in honour of John Sinclair. ... 171-179. Halliday, M.A.K. 1992. "Language theory and translation practice". ... In A. Wilson and T. McEnery (eds), Corpora in language education and research: a.

79KB Sizes 12 Downloads 202 Views

Recommend Documents

Swimming In Tarantulas
out of phase (as in walking); the arrows on the diagram show three legs simultaneously moving to propel the spider forwards. Also, the legs were angled ...

Order - Swimming World Magazine
Aug 12, 2016 - reader free to draw his own conclusions, those statements are .... article, where SWM included a link to a press conference where Hosszu denied the .... 454, 458, 636 P.2d 1236, 1240 (Ct. App. 1981) (citing Rosenblatt v. Baer ...

Order - Swimming World Magazine
Aug 12, 2016 - Case 2:15-cv-02285-GMS Document 48 Filed 08/12/16 Page 1 of 10. WO. IN THE ..... In the 2012 London Olympics, Hosszu placed 4th in her signature event, the 400 IM. (Id. at PDF4.) ..... Electronic document Stamp: [STAMP ...

Swimming Lessons KWs - Swimming Lessons (1).pdf
adult swimming lessons https://goo.gl/4U28L2 adult swimming lessons. aqua aerobics https://goo.gl/Hx1iGQ aqua aerobics. swim school https://goo.gl/Sr52Ew swim school. baby swimming lessons https://goo.gl/GhBgQm baby swimming lessons. swimming classes

YAC Swimming -
Please prepare your parents to volunteer. ... Amy Thorpe Hinze, Head Coach, YAC Swimming ... of coach certification may be required by the meet referee.

Swimming Lessons.pdf
PO Box 8373 Tampa, FL 33674. Phone: 844-The-Swan ... some frogs were nearby. In other words, it mostly ... Swimming Lessons.pdf. Swimming Lessons.pdf.

For Perfect Swimming Pool Construction in Canberra Hire Proficient ...
For Perfect Swimming Pool Construction in Canberra Hire Proficient Builders.pdf. For Perfect Swimming Pool Construction in Canberra Hire Proficient Builders.

For Perfect Swimming Pool Construction in Canberra Hire Proficient ...
For Perfect Swimming Pool Construction in Canberra Hire Proficient Builders.pdf. For Perfect Swimming Pool Construction in Canberra Hire Proficient Builders.

Swimming Lessons Drawing.pdf
adult swimming classes. seven hills aquatic center. swim instructor course. swimming lessons for adults. swimming lessons penrith. swim school near me. frensham pool. swim schools near me. swimming classes for kids. swimming lessons for babies. priva

Swimming Pool Permit.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Swimming Pool ...

HK Hurricanes Autumn Swimming Clinic.indd
Oct 9, 2015 - Progressive classes with assessment at each level. • Internationally certified swimming coaches. Open to All Levels - Enrol Online. Term 4 ...

Download [Pdf] In Other Words Full Pages
In Other Words Download at => https://pdfkulonline13e1.blogspot.com/0415467543 In Other Words pdf download, In Other Words audiobook download, In Other Words read online, In Other Words epub, In Other Words pdf full ebook, In Other Words amazon,

Tone Words in Categories.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Tone Words in Categories.pdf. Tone Words in Categories.pdf. Open. Extract. Open with. Sign In. Main menu.

Hitler In His Own Words
Extraordinary xmen 001.Hitler InHis ... Client list nlsubs.I just died.Hitler InHis OwnWords.Creed with arms wide open.David baldacciin pdf. ... Top gun pc game.

Term 1 Swimming newsletter.pdf
and every swimmer deserves. recognition for their efforts. Keep. working on improving those. PB's! Sportsmanship. Award. Ji Min Park. Progress. awards. Year 4.

CBSE National Swimming Competition.pdf
Sign in. Page. 1. /. 4. Loading… Page 1 of 4. CBSE National Swimming Competition. Page 1 of 4. Page 2 of 4. Page 2 of 4. Page 3 of 4. Page 3 of 4.