CEXI: Designing an English Italian Translational Corpus Federico Zanettin (Bologna, Italy)

Abstract This paper describes the first phase of the CEXI project at the University of Bologna in Forlì, involving the selection of the texts to be included in the corpus and decisions about the processing of these texts. The aim of the project is to create a resource which can be used by both students and researchers to learn about translation and translating. The English Italian Translational Corpus can be described as a bi-directional, parallel, translation-driven corpus, which in its core component will consist of about 4.6 million words, or 368 text samples of 10 to 15 thousand words each, from Italian original texts and their translations into English and vice versa, published between 1975 and 2000. The core corpus will subsequently be flanked with additional unidirectional parallel collections which will better reflect the specific characteristics of the two very different translation populations sampled, as well as expanded to include full texts and text types excluded from the core component. The paper deals with issues such as representativeness, balance and directionality with respect to the Italian and English language book publishing sector, detailing the composition of the different sub-components of CEXI, and showing that the creation of a corpus involves a series of compromises between what is ideally desirable and what is possible given practical and theoretical limitations. 1.

Introduction

In this paper I introduce a project aiming to construct a bilingual corpus at the School for Translators and Interpreters of the University of Bologna in Forlì, focusing on some corpus design issues.1 The project is called CEXI, where C stands for Corpus, E for English, I for Italian. The X stands for cross or translational, and iconically represents the relationships between the four corpus components. The corpus is designed to be bilingual, parallel, bi-directional and translation-driven. It is bilingual in that the languages are English and Italian, parallel in that every source text contained in the corpus has its corresponding translation, and bi-directional in that half of the translated texts in the corpus are English translations of Italian texts and half of them are Italian translations of English texts. Translation-driven means that when assessing the textual population to be sampled and selecting the actual texts to be included in the corpus it is the target texts, i.e. the translational components of the 1

See project web site at http://www.sitlec.unibo.it/cexi.

330

Federico Zanettin

corpus, which are taken as a starting point rather than vice versa (cf. Zanettin 2000). The project aims at creating a resource for language learning and translator training, with both a descriptive and an applied perspective in mind. CEXI is to be a resource for learning about language, culture and translation and for learning to read, write and translate. Given the project’s time frame, human resources and funds available we initially set a size limit for the corpus of approximately four million words. A further restriction concerns the availability of texts for reproduction in electronic format: once the texts to be included in the corpus have been selected, they have to be located and acquired. Before the texts can be converted into electronic format, copyright permissions need to be obtained, and copyright clearance is a notoriously laborious and hazardous process, especially in the case of parallel corpora as permission has to be sought for text pairs rather than single texts. The design of the corpus is based on that of the English Norwegian Parallel Corpus (ENPC, cf., e.g., Johansson/Hofland 1994, Johansson 1998, Johansson, Ebeling/Oksefjell 1999). Ideally, its composition should allow not only for the analysis of parallel concordances (once source and target texts have been aligned), but also for the comparison of (a.) translated vs. non-translated texts in the same language, for both languages (cf. e.g., Baker 1995, 1996, Laviosa 1997, 1998, Mauranen 2000); (b.) original texts across languages (see, e.g., Zanettin 1998, Gavioli/Zanettin 2000); and (c.) for translations across languages. Such a model, however, tends to blur the distinctive characteristics of the individual components, which in reality cannot be assumed to mirror each other precisely (Zanettin 2000, Johansson forthcoming). Such a “bi-directional parallel corpus” (Aston 1999) is a “reciprocal corpus” (Teubert 1996) only as far as the size of its four components is concerned, not as regards the relationship of each component to the textual population it stands for. For instance, if the translation components are selected to represent what gets translated into each target language from the other language, then the non-translational components are not representative of textual production in either source language insofar as they are selected only from a target point of view. While it is still open to debate whether a language, or even a restricted (written) variety of it, can or should be represented at all by a corpus (cf., e.g., Leech 1991:27), it can be safely assumed that no language can be represented by a corpus which includes only texts that have been translated. Equally, however, even if a corpus was devised to represent all the texts originally written in a language, such a corpus might still not represent written text production as a whole in that language. Paradoxically, one of the reasons why source language texts in a parallel corpus are not representative of source language production is that all of them are original, natural or spontaneous texts, as opposed to translated texts. Translation is a language production activity which is certainly subject to constraints, most notably that of a fully articulated text in another language, but not because of this should it be considered a deviant linguistic activity (Baker 2000:32-33). If rules and conventions

CEXI: Designing an English Italian Translational Corpus

331

are established by repeated use, as corpus linguistics seems to suggest, translated texts may play a role in defining what these rules and conventions are. This seems particularly true of a language like Italian, since much of what is published in Italian is, in fact, translated from other languages. The exclusion of translations from a corpus claiming to be representative (as much as other limitations allow2) of Italian would thus seem especially unjustified. 2.

Design criteria

The texts included in the corpus can be defined according to two sets of features: a set of selection features, which define the external boundaries of the textual population to be sampled, i.e. what we want our corpus to be representative of, and a set of descriptive features, regarding the number of texts to be included in the corpus and their extent (i.e. corpus composition), and the internal categories into which the corpus is subdivided. 2.1

Selection features

Selection features are criteria which are external to the composition of the corpus. We have used these criteria to decide which texts would go into the corpus and which texts were to be excluded. The first criterion relates to the medium: the corpus will only contain electronic versions of (parts of) published books, i.e. printed volumes bearing an ISBN number. While this constraint obviously selects only a small portion of the universe of written texts, to the exclusion of both published (e.g. newspapers and journals) and unpublished ones (e.g. road signs and grey literature), as well as ‘native’ electronic texts (web pages, e-mail messages, etc.), it also has the obvious advantage, from the corpus compiler’s point of view, of defining a fairly well-recorded population of texts. Published books are, arguably, also central to accepted standards of language production. The second criterion is that all the texts need to be paired. Since CEXI is a parallel corpus, it can only contain texts which have been translated. This constraint clearly leaves out not only all texts which have not been translated (which constitute the vast majority), but also those translations for which there is not a discernible source text, or where the translation status is dubious.3 A third selection feature implemented refers to the time of publication. CEXI is a synchronic corpus of contemporary language, defined as texts 2

The most obvious limitation is probably the availability and completeness of bibliographical data, or, as Pym would have it, the catalogues from which corpora are drawn (Pym 1998:38-42). 3 Self-translations, i.e. when the source text author and the translator are the same person, and indirect translations, e.g. a book originally written in Italian, but translated into English from a French translation (cf. Toury 1995), were also discarded to avoid adding further variables.

332

Federico Zanettin

published in the last 25 years (19764-2000), and preferably still in print. Some exceptions were made for recently translated books where the original text was first published before 1976 (but in no case before 1945). It was also decided to include only books targeted at adult readers, to the exclusion of children’s literature, schoolbooks, and simplified readers. To ensure homogeneity and reduce the variables, only the macro-genre prose was considered, to the exclusion of poetry, drama and comic books. Only books published in Italy, the USA and the UK were taken into consideration. A set of secondary selection criteria concerns the distribution of authors, translators and publishers in the corpus. While we wanted to have as wide as possible a range of representatives for each of these categories, we also wanted to take into account reception criteria, i.e. the respective importance of publishers, authors or translators. We thus decided not to rely simply on random sampling in order to obtain a balanced distribution of texts, but also to prefer among candidate titles best-selling works and authors, to account for readership figures. On the same grounds we decided to discard exceedingly expensive or rare books. 2.2

Descriptive features

Descriptive features are a set of criteria which describe the internal composition of the corpus. First, CEXI can be described in terms of its overall projected size, the number of texts for each of its components, and the extent of these texts (e.g. full texts vs. samples). The envisaged overall size has been established at about 4 million words, i.e. four 1 million word components. One million words is about 2,800 pages, or about ten to fifteen books. This latter, however, seemed too small a number to allow for any generalizations and thus for effective reference use. Thus, although we would have preferred to have full texts, we had no choice but to opt for samples, and decided upon 80 texts per component as the minimum number, which on a total of 4 million words means 320 samples (or 160 sample pairs) of about 12,500 words each. While not being substitutes for full texts, text samples have the advantage that, being of the same length, they are more amenable to statistical analysis, and copyright clearance may also prove easier to obtain for them than for full texts. We have not, however, ruled out the possibility of acquiring at least some texts in full. Having decided on the number and extent of the texts to be included in each corpus component, we next focused on the composition of each component of the corpus. We wanted to take both a production and a reception perspective, and for this needed statistics regarding book publishing in three different countries, and specifically about books translated from English and Italian. Books 4

This date was chosen as a divide as the Index Translationum database on CDROM, 1998 edition, goes back to that year for consistent data regarding the translations in question.

CEXI: Designing an English Italian Translational Corpus

333

are categorized differently by different people, e.g. by publishers, by booksellers, by librarians and, last but not least, by corpus compilers. After looking at specialized publications such as European Bookseller and Publishers Weekly, and well known corpora such as the Brown Corpus, the British National Corpus and the ENPC, we decided to adopt a broad distinction between imaginative and informative texts, which in our case means fiction and non-fiction, and to split the corpus evenly between these two domains. This solution, which has the advantage of obtaining comparable quantities of data for fiction and non-fiction, does not seem to be totally arbitrary, as can be seen from the figures for Italy in Table 1. Fiction

Non-fiction

Total

Titles (1994-1996)

30%

70%

100%

Titles x print numbers (1994-1996)

50%

50%

100%

Copies sold (1990)

65%

35%

100%

24%

41%

24%

11%

original

translation

original

translation

Table 1: Publication figures for Italy (from Vigini 1999, Berla 1993). If we take the average number of titles per year published in Italy between 1994 and 1996 (data from Vigini 1999), for instance, we find that fiction accounts for 30%. However, if we compare the figure obtained multiplying the number of titles by the number printed for each, fiction accounts for half the copies of books produced. If we move towards the reception end of the spectrum and look at the figures for books actually bought, the weight of fiction is even greater, accounting for 65% of all books sold (Berla 1993:62). We can also see that translated works of fiction have almost twice the sales of originals (41% vs. 24%), while the opposite is true for non-fiction (11% vs. 24%). Summing up, in Italy fewer fiction titles than non-fiction titles are published, but more fiction is sold, and most of this is translated. Half of the books sold are translations, and of these, four out of five are works of fiction.5 5

While the data concerning copies sold refer to an earlier period, the proportion of translations sold would subsequently appear to have increased rather than diminished, judging from the number of titles and print runs (Vigini 1999:70). The figures for titles produced indicate that in 1994-96 25% of all published books were translations, and English was the source language for one out of two of these (Vigini 1999:87). Our decision to subdivide each component into two equally sized fiction and non-fiction subcomponents was also supported by data from a reader response survey (Vigini 1999:117-143).

334

Federico Zanettin

It is not easy to obtain reliable data for translations within the total populations of published books in different countries, and particularly as far as specific source languages are concerned. However, the incidence of translation seems to be much greater for Italian than for English. With the development of the mass market in Italy from the 1980s onwards, translations have come to play an increasing role. In 1998 27.9% of the 44,964 titles published in Italy were translations. Translations from English accounted for over half this number (16.5% of all published titles), and average printings for translations from English were almost double those for Italian originals (8,624 vs. 4,756, data from Lottman 2000). It seems not far from the truth to say that almost one book out of three sold today in Italy is a translation from English. The situation of the UK and the USA is quite different. Translations there account for 2-3% of total titles produced, and they are rarely best-sellers (Hale 1996:27, Venuti 1995:12-15). Translations from Italian are a minor percentage of these. 3.

Text selection

Each text in the corpus can thus be classified according to the following variables: · Translation vs. non-translation · Fiction vs. non-fiction · English vs. Italian In the following sections each of the subcategories resulting from the intersection of these variables will be considered. 3.1

Translations

The Index Translationum (1998 Edition on CD-ROM) is a database published by UNESCO and compiled with data submitted from libraries around the world. Though far from satisfactory, it is the most complete list of translated books available. From this database we selected (a.) all translations from English into Italian published in Italy between 1976 and 1995,6 and (b.) all translations from Italian into English published in the USA and the UK between 1977 and 1996.7 Each entry in the Index translationum is assigned to a subject category from the Universal Decimal Classification (UDC). If we examine the data in Table 2, we 6

Only 13 texts are prior to that date and none is recorded after it. There are no entries for translations from Italian published in the UK after 1988. As the data for the period 1977-1988 are similar for the UK and the USA, we chose as a sampling frame the data for the USA between 1977 and 1996. 7

CEXI: Designing an English Italian Translational Corpus

335

see that not only are there more books translated from English into Italian than vice versa in absolute terms, but we also have very different percentages for each category. E UDC category

I (Italy 76-95)

I

E (USA 77-96)

Texts

%

Texts

%

4,817

40%

502

28%

757

6%

343

19%

Education/Law/Social Sciences

1,251

11%

187

11%

Applied Sciences

1,835

16%

138

8%

History/Geography/Biography

919

8%

171

10%

Natural and Exact Sciences

643

6%

111

6%

Philosophy/Psychology

833

7%

53

3%

Religion/Theology

477

4%

267

15%

Generalities/Information Sciences

101

1%

2

0%

11,633

100%

1,774

100%

Literature/Children’s Literature Art/Games/Sport

Total

Table 2: Translated titles in Italy (from English) and the USA (from Italian). 3.1.1 Translated fiction The Index Translationum UDC category of Literature/Children’s literature is in some respects too broad for our purposes, as it includes text types we had decided not to include, such as non-contemporary fiction (i.e. translations of source texts first published before 1945), poetry, drama, comics and, of course, children’s literature. The category also includes some text types we would rather assign to the non-fictional component, e.g. literary criticism and linguistics. At the same time, it does not differentiate between fictional subcategories, such as general fiction, romance, thrillers and science-fiction, numbers for which are available for total book production and which seem potentially revealing of different translation policies and practices in the different countries. From the analysis of sets of 312 randomly sampled titles from the Index (cf. 4.1 below), as well as from a survey of book publishing magazines and other sources, what emerged was not only that the flow of fiction translations from English into Italian is much more substantial than that in the opposite direction (both in absolute numbers and even more so as a proportion of total production), but also that different types of fiction get translated into English and into Italian. Only a very small proportion of fiction published in the UK and the USA consists of translations, and these books mostly belong to what has been called “difficult literary fiction” (Schiller 1993:28). Only a very few translations are best-sellers, notable exceptions from

336

Federico Zanettin

Italian being Umberto Eco’s The Name of the Rose and Oriana Fallaci’s A Man. In the random selection of titles from the Index we find, besides re-translations or re-printings of Dante, Boccaccio and Pirandello works by Calvino, Sciascia, Pasolini, Silone and Tabucchi, but hardly any popular fiction. What is instead translated from English into Italian, besides evergreens such as Shakespeare and Dickens and some ‘high quality’ literary fiction, is mostly best-selling authors such as Crichton, Grisham and King, and popular romance and detective stories. Similar unbalances occur in translated non-fiction, as will be shown below. 3.1.2 Translated non-fiction We decided to model the non-fictional components of the corpus on the Index Translationum percentages for the eight non-literary UDC categories for Italy and the USA (Table 3). % UDC categories

E I (Italy)

I E (USA)

Art/Games/Sport

11%

27%

Education/Law/Social Sciences

19%

Applied Sciences

28%

History/Geography/Biography

No. of texts to be included in corpus E I I E 4

11

15%

8

6

11%

11

4

13%

13%

5

5

Natural & Exact Sciences

9%

9%

4

4

Philosophy/Psychology

12%

4%

5

2

Religion/Theology

7%

21%

3

8

Generalities/Information Sciences

1%

0%

0

0

100%

100%

40

40

Total

Table 3: Translational components, non-fiction. Religion/Theology and Art/Games/Sport are the subcategories most translated from Italian into English, representing almost half non-fiction translations. Conversely, Applied Sciences books have a much higher percentage of translations from English into Italian than vice versa. When we map these figures onto the total of 40 texts which constitutes the non-fictional component for each language we obtain the composition shown in the columns on the right of Table 3. 3.2

Non-translations

Since CEXI is a parallel corpus, the non-translational components are modelled on the translational ones, mirroring their composition, and this affects both the types of comparison which can be made between translations and non-translations

CEXI: Designing an English Italian Translational Corpus

337

in the same language and those which can be made between the non-translational components across languages. Table 4 shows the composition of both the fictional and non-fictional sections in the two languages in comparison to book production as a whole in Italy, the UK, and the USA. In order to compare the data, statistics regarding book publishing as a whole have been adjusted to fit the Index Translationum categories. Italian (titles)

Fiction Nonfiction

English (titles)

Book production (Italy, 90-96) 27%

Translations (E I, 7896) 40%

Book production (USA, UK 9096) 22% (USA) 24% (UK)

Translations (I E, 7896) 28% (USA) 31% (UK*)

73%

60%

78% (USA) 76% (UK)

72% (USA) 69% (UK*)

Table 4: Book production vs. translations (* UK = 1978-1989). While publication figures are similar in the three countries, the number of nonfiction titles being greater than fiction ones, in Italy translated titles show a higher percentage for fiction than in either the UK or the USA. Fiction accounts for 40% of all translations from English published in Italy, while it represents only about 30% of translated titles from Italian into English. 3.2.1 Original fiction The fictional non-translational component for Italian is mostly composed of ‘high-quality’ literary fiction. For this corpus component to be representative of book production in Italy, what is missing is translations, Italian best-sellers and popular fiction. Conversely, the non-translational English fiction component includes mostly best-sellers and popular fiction. The non-translational components are therefore not representative of book production or reception in their respective countries, and, moreover, inter-linguistic comparison between translations and source texts in the two languages and cross-linguistic comparison between Italian and English source texts implies a comparison of different fictional subgenres. The composition of the corpus will juxtapose, for Italian, translated middle- and low-brow literature with high-brow ‘original’ literature, and for English, translated Italian ‘literature’ with ‘original’ popular fiction and best-sellers. When comparing the ‘originals’ this might give rise to a view of the two languages which does not correspond to actual language use (though it may correspond to the picture that language users have of the foreign languages and cultures).

338

Federico Zanettin

3.2.2 Original non-fiction A similar picture emerges even more clearly if we compare data for book production in Italy, the UK and the USA as regards non-fiction works with the composition of the Italian and English non-translational, non-fiction components of the corpus. If we compare the non-translational, non-fiction Italian component of the corpus with the total production of titles in Italy, we find that what gets translated into English (from Italian source texts) bears little correspondence to what gets published in Italy. Most notably, books belonging to the Religion/Theology, Art/ Games/Sport, and Natural and Exact Sciences categories are translated into English much more often than would be expected given overall book production figures. The opposite is true for Education/Law/Social Sciences and Philosophy/ Psychology, where percentually only about half of what would be expected is translated into English. Comparable data for book production for the same years (1990-95) for the UK and the USA also reveal that the composition of the non-translational, nonfiction English component of the corpus is different from what it would be if put together on the basis of overall publication figures for the two countries. Most notably, the Applied Sciences and Philosophy/Psychology categories are over-represented, while the Art/Games/Sport category is under-represented (cf. Table 5).

Art/Games/Sport Education/Law/ Social Sciences Applied Sciences History/Geography/Biography Natural & Exact Sciences Philosophy/ Psychology Religion/Theology Generalities/Information Sciences Total

NonProducNonProductranslational tion in translational tion in Italian Italy English USA Component component 27% 16% 11% 9% 15% 28% 19% 29%

Production in UK 27% 20%

11% 13%

16% 15%

28% 13%

23% 13%

15% 13%

9%

5%

9%

8%

4%

4%

8%

12%

5%

3%

21% 0%

8% 4%

7% 1%

7% 6%

12% 6%

100%

100%

100%

100%

100%

Table 5: Corpus (provisional) composition vs. total production in Italy, the USA, and the UK.

CEXI: Designing an English Italian Translational Corpus 3.3

339

Juggling with numbers

Faced with these differences in the composition of the non-fiction components, and the mismatches with overall title production figures, we decided on a compromise strategy which will, we believe, permit effective comparisons to be made by varying the corpus composition according to the aims of the analysis. By adding just 12 translations (and source texts) for each language, the corpus will have the same number of texts per non-fiction subcategory in each component. This will be useful when comparing translations and non-translations in the same language, or non-translations across languages, while leaving open the option of ignoring these additional texts when this is judged more appropriate (cf. Table 6). Final no. of texts (per language)

Initial no. of texts

Supplementary texts

Engl.

It.

Engl.

It.

Art/Games/Sport

11

4

-

7

Education/Law/Social Sciences

6

8

2

-

8

Applied Sciences

4

11

7

-

11

History/Geography/Biography

5

5

-

-

5

Natural & Exact Sciences

4

4

-

-

4

Philosophy/Psychology

2

5

3

-

5

Religion/Theology

8

3

-

5

8

Generalities/Information Sciences

0

0

-

-

0

Total

40

40

12

12

52

11

Table 6: Corpus composition, non-fiction. The components will still not reflect overall publication figures for the countries involved, but they will at least be mutually comparable. We are currently examining the possibility of similar adjustments for fiction categories. 4.

Corpus composition

Consideration of the texts to be included in the corpus has shown that in order to compare texts across languages, even within the same broad domain (fiction vs. non fiction), the composition of the corpus needs to be adjusted to account for different text production policies in the different countries (both regarding original texts and translations), and the same is true when comparing originals and translations in the same language. Furthermore, while the scope of the project forced us to choose samples (rather than full texts), this may well limit the range

340

Federico Zanettin

of uses of the corpus. We have therefore decided to work first towards the creation of a core corpus following the parameters outlined above, which can then be expanded in a number of ways. 4.1

Core corpus

Each of the four components of the core corpus (translations and originals, Italian and English) will be made of a set number of texts, adding up to a total of 368 text samples, or approximately 4.6 million words. Category

No. of text samples

No. of words

Fiction

40

500,000

Non-fiction

52

650,000

Total for each component

92

1,150,000

Total for core corpus

368

4,600,000

Table 7: Core corpus. In order to reach the desired number of texts for the core corpus we selected, from the Index Translationum, a random list of 312 translated titles for each of the two categories (fiction vs. non-fiction) for each language. From this total of 1248 titles we then manually discarded those books which did not fit our criteria, while at the same time filling in bibliographical details, including copyright information, for the translations selected and for the corresponding source texts. To this first list we added a small number or texts chosen according to other criteria (e.g. best-sellers, texts included in the ENPC). We thus ended up with a list of about 1,500 titles, half of them originals and half of them translations. We are now negotiating copyright clearance with authors and publishers in order to achieve permission to finally create electronic versions of at least 184 (2 x 92) text pairs. 4.2

Expansions

We plan on three kinds of subsequent expansion, concerning text extent, number and type. While the core corpus will be composed of text samples, we are asking permission preferably for whole texts, so that what is left out from the 12,500 word samples can be acquired later on. Even if only a minority of the parallel texts are complete, we may still want to study individual texts and compare them against the backdrop of the full corpus. A second kind of expansion refers to the acquisition of other texts in order to permit corrections to the composition of the non-translational components of the corpus. Thus we would want to include, for instance, Italian popular fiction and best-sellers which have not been translated into English if this corpus component is to represent the population of Italian ‘original’ narrative fiction. If it

CEXI: Designing an English Italian Translational Corpus

341

is desired to obtain a global representation of narrative fiction published in Italy, it will be necessary to also include translations, from other languages as well as from English. Finally, with time we hope to add satellite corpora of text types excluded from the original design, such as poetry or children’s fiction. 4.3

Future goals

CEXI is the first project aiming at the construction of a fairly well-balanced, bidirectional parallel corpus of Italian and English prose. The stages of corpus design and copyright clearance are only preliminary to the acquisition of the texts in electronic format, their encoding and alignment.8 When these have been completed the corpus will be made available via a web interface, hopefully providing a valuable resource for the study of translation and language and for translator training.

References Aijmer, Karin/Bengt Altenberg, eds. (1991), English Corpus Linguistics. Studies in Honour of Jan Svartvik. London & New York: Longman. Aston, Guy (1999), “Corpus Use and Learning to Translate,” in: Bassnett/ Bollettieri Bosinelli/Ulrych (1999), 289-314. Baker, Mona (1995), “Corpora in Translation Studies: An Overview and some Suggestions for Future Research,” Target 7:2, 223-243. Baker, Mona (1996), “Corpus-based Translation Studies: The Challenges that Lie Ahead,” in: Somers (1996), 175-186. Baker, Mona (2000), “Linguistica dei corpora e traduzione. Per un’analisi del comportamento linguistico dei traduttori professionisti,” in: Bernardini/ Zanettin (2000), 31-44. Bassnett, Susan/Rosa Maria Bollettieri Bosinelli/Margherita Ulrych, eds. (1999), Translation Studies Revisited. Textus XII:2, Genova: Tilgher. Berla, Erica (1994), “Italy Takes to Foreign Fiction,” European Bookseller, January/February 1994, 62-63. Bernardini, Sivia/Federico Zanettin, eds. (2000), I corpora nella didattica della traduzione, Bologna: CLUEB. Fries, U./G. Tottie/P. Schneider, eds. (1994), Creating and Using English Language Corpora, Amsterdam & Atlanta, GA: Rodopi. Gavioli, Laura/Federico Zanettin (2000), “I corpora bilingui nell’apprendimento della traduzione. Riflessioni su un’esperienza pedagogica,” in: Bernardini/ Zanettin (2000), 61-80. 8

The corpus will be encoded according to the XML/TEI and XML/CES international standards (cf. Sperberg-McQueen/Burnard 1999, Ide/Bonhomme 2000).

342

Federico Zanettin

Hale, Terry (1996), “Redressing the Balance,” European Bookseller, June/July 1996, 27-29. Ide, Nancy/Patrick Bonhomme (2000), XML Corpus Standard Encoding Document XCES 0.2, http://www.cs.vassar.edu/XCES/ Index Translationum, 5th edition, UNESCO 1998. Johansson, Stig (1998), “On the Role of Corpora in Cross-linguistic Research,” in Johansson/Oksefjiell (1998), 3-24. Johansson, Stig (forthcoming), “Reflections on Corpora and their Uses in Crosslinguistic Research,” in: Zanettin/Bernardini/Stewart (forthcoming). Johansson, Stig/Jarle Ebeling/Signe Oksefjell, English Norwegian Parallel Corpus: Manual. Oslo, http://www.hf.uio.no/iba/prosjekt/ ENPCmanual.html Johansson, Stig/Knut Hofland (1994), “Towards an English-Norwegian Parallel Corpus,” in: Fries/Tottie/Schneider (1994), 25-37. Johansson, Stig/Signe Oksefjiell, eds. (1998) Corpora and Cross-linguistic Research. Theory, Method, and Case Studies, Amsterdam & Atlanta, GA: Rodopi. Laviosa, Sara (1997), “How Comparable can ‘Comparable Corpora’ be?,” Target 9:2, 289-319. Laviosa, Sara (1998), “Core Patterns of Lexical Use in a Comparable Corpus of English Narrative Prose,” Meta 43:4, 557-570. Leech, Geoffrey (1991), “The State of the Art in Corpus Linguistics”, in Aijmer/ Altenberg (1991), 8-29. Lottman, Herbert R. (2000), “Italy Top Market for Translations,” Publishers Weekly, 1/10/2000. Mauranen, Anna (2000), “Strange Strings in Translated Language. A Study on Corpora,” in: Olohan (2000), 119-141. Olohan, Maeve, ed. (2000), Intercultural Faultlines. Research Models in Translation Studies I. Textual and Cognitive Aspects, Manchester: St Jerome. Pym, Anthony (1998), Method in Translation History, Manchester: St Jerome. Schiller, Heather (1993), “Fiction: What Works in Britain,” European Bookseller, September/October 1993, 24-28. Somers, Harold, ed. (1996), Terminology, LSP, and Translation: Studies in Language Engineering in Honour of Juan C. Sager, Amsterdam & Philadephia: John Benjamins. Sperberg-McQueen, C.M./L. Burnard, eds. (1999), Guidelines for Electronic Text Encoding and Interchange, Revised Reprint, Oxford, http://www.hcu.ox.ac.uk/TEI/Guidelines/. Teubert, Wolfgang (1996), “Comparable or Parallel Corpora?,” International Journal of Lexicography, 9 (3): 238-264. Toury, Gideon (1995), Descriptive Translation Studies – and Beyond, Amsterdam & Philadelphia: John Benjamins.

CEXI: Designing an English Italian Translational Corpus

343

Venuti, Lawrence, (1995), The Translator’s Invisibility, New York & London: Routledge. Vigini, Luigi (1999), Rapporto sull’editoria italiana, Milano: Editrice bibliografica. Zanettin, Federico (1998), “Bilingual Comparable Corpora and the Training of Translators,” Meta 43:4, 616-630. Zanettin, Federico (2000), “Parallel Corpora in Translation Studies: Issues in Corpus Design and Analysis,” in: Olohan (2000), 105-118. Zanettin, Federico/Silvia Bernardini/Dominic Stewart, eds. (forthcoming), Corpora in Translator Education, Manchester: St. Jerome.

CEXI: Designing an English Italian Translational Corpus

languages are English and Italian, parallel in that every source text contained in .... and grey literature), as well as 'native' electronic texts (web pages, e-mail.

139KB Sizes 0 Downloads 105 Views

Recommend Documents

Swinehart (2013) Aviation English Corpus Linguistics.pdf ...
used by ATC when an aircraft is cleared to depart (“The Tenerife Airport Disaster”, n.d.). English has been the official language of the international aviation ...

Using Second Life in an English Course: Designing ...
Delwiche (2006) taught two online virtual world-based courses, the first using the game Everquest and ... meeting places in Second Life (see examples in Fig.

Building a Large English-Chinese Parallel Corpus from ...
First, based on a large corpus of English-Chinese comparable patents, more than 22 million bilingual .... companies may be interested in monitoring and analyzing the patents filed in ... translation engines and more parallel data to help us.

Translational Equilibrium Notes.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Translational Equilibrium Notes.pdf. Translational Equilibrium Notes.pdf. Open. Extract. Open with. Sign In.

Vaccine Research Innovation (VRI) Awards - Translational Health ...
Feb 15, 2011 - scientists to work at the Vaccine and Infectious Disease. Research Centre ... shall not be entitled to draw any other fellowship or salary.

Learn-Italian-Parallel-Text-Easy-Stories-English-Italian.pdf ...
Page 1 of 3. Download ~~~~~~!!eBook PDF Learn Italian - Parallel Text - Easy Stories (English - Italian). (PDF) Learn Italian - Parallel Text - Easy Stories ...

XCorpus – An executable Corpus of Java Programs
Mar 17, 2017 - related work in section 3, followed by a discussion of the various aspects of ... the precision of call graphs in languages that use dynamic dispatch, and the .... Donnell et al. on API stability of Android applications using projects