Professional Documents
Culture Documents
EST Subseries
The European Society for Translation Studies (EST) Subseries is a publication
channel within the Library to optimize EST’s function as a forum for the
translation and interpreting research community. It promotes new trends in
research, gives more visibility to young scholars’ work, publicizes new research
methods, makes available documents from EST, and reissues classical works in
translation studies which do not exist in English or which are now out of print.
Advisory Board
Rosemary Arrojo Zuzana Jettmarová Rosa Rabadán
Binghamton University Charles University of Prague University of León
Michael Cronin Werner Koller Sherry Simon
Dublin City University Bergen University Concordia University
Daniel Gile Alet Kruger Mary Snell-Hornby
Université Paris 3 - Sorbonne UNISA, South Africa University of Vienna
Nouvelle
José Lambert Sonja Tirkkonen-Condit
Ulrich Heid Catholic University of Leuven University of Joensuu
University of Stuttgart
John Milton Maria Tymoczko
Amparo Hurtado Albir University of São Paulo University of Massachusetts
Universitat Autònoma de Amherst
Franz Pöchhacker
Barcelona
University of Vienna Lawrence Venuti
W. John Hutchins Anthony Pym Temple University
University of East Anglia
Universitat Rovira i Virgili
Volume 82
Corpus Use and Translating. Corpus use for learning to translate
and learning corpus use to translate
Edited by Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón
Corpus Use and Translating
Corpus use for learning to translate and learning
corpus use to translate
Edited by
Allison Beeby
Patricia Rodríguez Inés
Pilar Sánchez-Gijón
Universitat Autònoma de Barcelona
Corpus use and translating : corpus use for learning to translate and learning corpus
use to translate / edited by Allison Beeby, Patricia Rodríguez Inés and Pilar
Sánchez-Gijón.
p. cm. (Benjamins Translation Library, issn 0929-7316 ; v. 82)
Includes bibliographical references and index.
1. Translating and interpreting--Data processing. 2. Corpora (Linguistics) 3.
Translators--Training of. I. Beeby Lonsdale, Allison. II. Rodríguez Inés,
Patricia. III. Sánchez-Gijón, Pilar.
P309.C67 2009
440--dc21 2008041947
isbn 978 90 272 2426 2 (Hb; alk. paper)
isbn 978 90 272 9106 6 (eb)
Introduction 1
Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón
Evaluating the process and not just the product when using corpora
in translator education 129
Patricia Rodríguez Inés
Editors: Contributors:
Guy Aston
University of Bologna at Forlì, Italy
The Corpus Use and Learning to Translate workshops were born out of two be-
liefs. First, that language corpora, if selected and used appropriately, are able to
provide more abundant and reliable information to the translator than traditional
reference tools, such as dictionaries and “parallel texts”. Second, as previous work
in foreign language teaching had suggested, that corpora are able to offer learn-
ing environments which empower learners and increase their autonomy, allowing
them to develop their knowledge and awareness while at the same time providing
them with a range of opportunities for using the language – amongst which, for
engaging in translation.
Much of the discussion within CULT has focussed on the potential of different
types of corpora – from large established monolingual mixed reference corpora
to small do-it-yourself specialised monolingual or comparable ones, from paral-
lel corpora of original texts and their official translations to corpora of learner
translations. A lot of work has gone into developing better ways of constructing
appropriate corpora, and better tools to interrogate them within a “translator’s
workbench”. At the same time, there has been continual discussion of how we can
develop translators’ ability to exploit corpora effectively.
The difficulty of using corpora is that they rarely provide immediate answers
to a translator’s problems. Unlike translation memory or machine translation sys-
tems, they do not instantly present a preferred candidate for the user to accept,
modify or reject. Corpus data has to be interpreted and evaluated comparatively
to reach conclusions, and this requires not only technical skill (perhaps the least
of the problems, since learners’ computational competence is often greater than
their teachers’), but above all critical thought. Training would-be translators to
use corpora goes hand in hand with educating them to think about the translation
process and the learning process, developing their sensitivity as to how they can
use corpora in these processes.
It is difficult to deny that corpus use is anti-economic in the short term, and
this is probably why, while increasingly taught in translation schools, it has not
Guy Aston
References
Bernardini, S. & F. Zanettin (eds.) (2000). I corpora nella didattica della traduzione – Corpus Use
and Learning to Translate. Bologna: Cooperativa Libraria Universitaria Editrice.
Zanettin, F., S. Bernardini & D. Stewart (eds.) (2003). Corpora in Translator Education. Man-
chester: St Jerome.
Introduction
. A selection of papers from the 1997 conference were included in Bernardini and Zanettin
(2000).
. Some of the work presented at the CULT2K conference provided the kernel for Zanettin,
Bernardini and Stewart (2003).
Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón
was held in 2004. Most, but not all, of the contributions to this volume devel-
oped out of the Barcelona conference (CULT BCN), which may explain the book’s
Spanish flavour. Further background to the disciplines involved in CULT can be
found in different chapters of this book: corpus linguistics (Corpas Pastor and
Seghiri Domínguez), corpus-based translation studies (Stewart, Frankenberg-
Garcia and Philip), corpora in language teaching and translator training (Marco
and Van Lawick, Sánchez-Gijón and Rodríguez Inés).
Many of the issues addressed in this book are related to questions that were
raised in CULT2K. What is the role of corpora in documentation for translators?
Why bother with corpora when we have the Internet? If we use ad hoc or dispos-
able corpora, how can we be sure they are reliable or representative? Is the time
needed to learn how to build and use corpus worth the effort? As Silvia Bernardini
said in Barcelona in 2004, we are still looking for a balance between training and
education, between the claims of Gouadec (2002/2007), “no serious translator
training programme can be dreamt of unless the training environment emulates
the work station of professional translators” and the reminder of Mossop (1998),
“if you can’t translate with pencil and paper, then you can’t translate with the lat-
est information technology.” In fact, translation faculties have to find this balance
between the positions of Gouadec and Mossop if their graduates are to survive in
the real world of the professional translator in the 21st century.
Translation has been an object of research in Artificial Intelligence and the
computer sciences and attempts have been made to make the translation process
partially or totally automatic. Fully automatic translation programmes remain
a chimera and researchers have turned to less ambitious but more productive
projects. Some of these have revolutionized both the way translators work to-
day (for example, computer assisted translation programmes) and the way they
solve problems (for example, terminology data bases). Corpus linguistics has also
been added to the technology-based battery of resources at the translator’s dis-
posal. However, in the case of corpus linguistics, the technology is accompanied
by a methodology as well as a number of free access corpora, or the possibility of
building a corpus with relative ease. Corpus linguistics tools allow translators to
approach texts, their own and those of others, and analyse them both quantita-
tively and qualitatively.
Translator trainers have been using these tools in the classroom for over a
decade, both in general and specialised translation and in both directions, B-A/
A-B (translating into/out of the translator’s language of habitual use). Corpora
have proved to be very useful when trainee translators are working into a foreign
language and have to compensate for insecurities in the target language and cul-
ture. Several of the contributors to this book have used corpora to teach transla-
tion into a foreign language (Stewart 2001; Corpas 2001; Rodríguez Inés 2008).
Introduction
. The PACTE group (Process in the Acquisition of Translation Competence and Evalua-
tion), which has been conducting an empirical research on translation competence (TC) and
its components since 1997, has put forward a TC model in which “translation competence is
the underlying system of knowledge needed to translate. It includes declarative and procedural
knowledge, but the procedural knowledge is predominant” (PACTE 2003: 43–66).
Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón
integrate other kinds of declarative and procedural knowledge needed for trans-
lation competence, such as field specific knowledge of specialised genres, docu-
mentation, terminology, IT and translator tools. Of course, all these can be taught
as separate subjects, but it is probably more efficient to teach them as part of a
translation task in a specialised translation course. Furthermore, integrating the
different kinds of knowledge to solve problems should encourage critical thinking
and help teachers to find the right balance between education and training. The
choice of the methodology to be used will depend on the objectives of the teach-
ing module, the field and the kinds of corpora used.
As was mentioned above, the contributions to this volume fall into two main
sections that are reflected in the subtitle of the book: Corpus use for learning to
translate and learning corpus use to translate.
The first part, Corpus use for learning to translate, or corpora as a source of ma-
terials for translator training, is the least controversial. This is a methodology that
has been used widely in language teaching and the corpora are selected and con-
trolled by the teacher to provide real life examples and exercises. The time spent
learning corpus use is invested by the teacher, who then has a marvellous tool with
which to produce teaching materials that can be used for very specific learning
tasks directed at the needs of a particular group of students. Student-centred teach-
ing has obvious pedagogical advantages. Depending on the nature of the tasks,
the students’ learning can be deductive or inductive and they can see that there
are other sources of authority apart from the teacher’s ‘intuition’. It is true that this
use of corpora to develop teaching materials is well established for learning about
certain aspects of translation related to contrastive language or terminology. The
first chapter in this section falls into this category. However, corpus methodol-
ogy can also be used to prepare classroom materials designed to raise awareness
about more complex, or lesser known phenomena, for example, semantic prosody
in Chapter 2 and explicitation as a translation universal in Chapter 3.
In the first chapter, ‘Using corpora and retrieval software as a source of ma-
terials for the translation classroom’, Josep Marco and Heike Van Lawick provide
a useful introduction to teachers wanting to begin to work in this field. The au-
thors offer a brief review of the origins of corpus-related resources in translator
training and the distinction between corpus-based and corpus-driven learn-
ing. In the first case, teachers select material from corpora to design classroom
materials for specific objectives. In the second case, students have access to an
enormous range of language data, but they have to learn how to use this data for
autonomous learning.
. See: Beeby (1996), Hurtado (1999), Kiraly (2000), González Davies (2004).
Introduction
and idiosyncratic language in the source text and produce translation equivalence
in the target text. According to Philip, parallel corpora are usually neither large
nor wide-ranging enough to be able to provide much information on generalised
norms within the languages involved. Philip bases her conclusions on a corpus-
driven study of connotation in non-literary language where she examines the
meaning of colour words in conventional expressions such as to see red, to feel
blue, and green with envy, and explains what factors are responsible for activating
the connotative meanings of the colour words when the expressions are used in
running text.
The second part of this volume, dedicated to Learning corpus use to translate,
is perhaps more obviously related to the issues raised by the Bologna reform as
the authors of the three chapters in this section are involved in designing teach-
ing modules using the European credit system at both undergraduate and post-
graduate level. They belong to a new generation of translator trainers who grew
up using computers, have degrees in translating and interpreting and experience
as professional translators. Corpas and Seghiri address the problem of evaluat-
ing representativeness of corpora built as documentation resources. Sánchez-Gi-
jón suggests a CULT-based methodology to integrate learning documentation,
corpus linguistics, and terminology in specialised translation courses. Rodríguez
Inés offers a methodology for evaluating this learning process.
Chapter 5, ‘Virtual corpora as documentation resources: Translating travel
insurance documents’, is by Gloria Corpas Pastor and Miriam Seghiri Domín-
guez. The authors stress the importance of documentation as a core subject in the
curriculum of Translation and Interpreting degrees, present a brief introduction
to the literature on corpus compilation and go on to provide a systematic meth-
odology for corpus compilation based on electronic resources available on the
Internet. The authors also describe their own software application, ReCor, which
enables accurate evaluation of corpus representativeness by measuring lexical
density (the relation between types and tokens, i.e. the number of different words
in a text and the total number of words). The corpus is representative if the lexi-
cal density does not alter when more texts are added. The protocol and ReCor are
illustrated through the example of the creation of a virtual corpus of travel insur-
ance documents in English and Spanish, which is later tested for representative-
ness. Finally, the pedagogical applications of this research are stressed and some
specific examples are given of possible uses in B-A/A-B translations of travel in-
surance documents.
In Chapter 6, ‘Developing documentation skills to build do-it-yourself corpo-
ra in the specialised translation course’, Pilar Sánchez-Gijón defends the use of do-
it-yourself corpora in the specialised translation class with a proposal for a CULT-
based methodology to integrate documentation, corpus linguistics, terminology
Introduction
and translation skills. She starts with the specialised translator’s needs, the role of
documentation in the curriculum and the advantages of creating do-it-yourself
corpora and improving search strategies to retrieve relevant texts. The example
used to illustrate her proposal includes suggestions not only for solving terminol-
ogy problems, but also textual problems involving the target text reader, contras-
tive rhetoric and the degree of formality required when translating from English
to Spanish.
In Chapter 7, ‘Evaluating the process and not just the product when using
corpora in translator education’, Patricia Rodríguez Inés is also concerned with
the demands of the translation profession and believes that the reforms implicit
in the Bologna Declaration and the European Space for Higher Education (e.g.
promoting curriculum innovation based on learning outcomes, profession-ori-
ented learning objectives, lifelong learning etc.) should help trainee translators to
face these demands successfully: to develop expert knowledge and competences,
to gain autonomy and be able to find strategies to solve new problems using new
technologies. The chapter begins by justifying the theoretical and methodologi-
cal framework chosen for a task-based proposal for teaching the use of electronic
corpora to trainee translators. However, the main contribution is a proposal for
evaluating the learning process, not just the final product, by recognising good
practices, appropriateness, quality and acceptability. The proposed evaluation is
part of a teaching unit for final year undergraduate students, ‘Ingredients for my
corpus: quality texts’. It is designed to build up responsibility and autonomy and
the evaluation includes self-assessment, peer assessment and teacher assessment.
Both aspects of CULT, Corpus use for learning to translate and learning cor-
pus use to translate, are a real possibility in most European translation faculties,
with increasingly sophisticated computers and software specially designed to
make the most of the enormous possibilities of existing corpora and the Inter-
net. However, CULT should always be part of a pedagogically sound syllabus in
which all aspects of education are taken into account. CULT is only one aspect of
a translator’s training and, despite the technological advances, time is needed to
train corpus users in good practices and to give them the knowledge and the tools
to build reliable, representative corpora. We think that the time is well spent and
hope that this book will encourage ‘novice’ CULT teachers to experiment as well
as suggest a few new ideas to the ‘experts’.
References
Aston, G. 1999. ‘Corpus Use and Learning to Translate’. Textus XII: 2 (special issue: “Translation
Studies Revisited”): 289–314.
Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón
Aston, G. 2000. “Corpora and language teaching”. In Rethinking language pedagogy from a cor-
pus perspective, L. Burnard and T. McEnery (eds), 7–17. Bern: Peter Lang.
Baker, M. 1993. ‘Corpus Linguistics and Translation Studies – Implications and Applications’.
In Text and Technology. In Honour of John Sinclair, Mona Baker, Gill Francis and Elena
Tognini-Bonelli (eds), 233–252. Amsterdam/Philadelphia: John Benjamins.
Beeby, A. 1996. Teaching Translation from Spanish to English. Ottawa: Ottawa University Press.
Bernardini, S. 2000a. “Systematising serendipity: Proposals for concordancing large corpora
with language learners”. In Rethinking Language Pedagogy from a Corpus Perspective,
L. Burnard and T. McEnery (eds), 225–234. Bern: Peter Lang.
Bernardini, S. and Zanettin, F. 2000. I corpora nella Didattica della Traduzione – Corpus Use and
Learning to Translate. Bologna: CLUEB.
Corpas Pastor, G. 2001. “Compilación de un corpus ad hoc para la enseñanza de la traduc-
ción inversa especializada”, Trans 5: 155–184. (Also available at: http://www.trans.uma.
es/Trans_5/t5_155-184_GCorpas.pdf)
González Davies, M. 2004. Multiple Voices in the Translation Classroom. Amsterdam/Philadel-
phia: John Benjamins.
Gouadec, D. 2002. Profession: Traducteur. Paris: La Maison du Dictionnaire.
Gouadec, D. 2007. Translation as a Profession. Amsterdam/Philadelphia: John Benjamins.
Hurtado Albir, A. (Dir.) 1999. Enseñar a traducir. Madrid: Edelsa.
Kiraly, D. C. 2000. A Social Constructivist Approach to Translator Education; Empowering the
Translator. Manchester: St. Jerome.
Kübler, N. 2003. “Corpora and LSP translation”. In Corpora in translator education, F. Zanettin,
S. Bernardini, D. Stewart (eds), 25–42. Manchester: St. Jerome.
Maia, B. 2003. ‘Training Translators in Terminology and Information Retrieval using Compa-
rable and Parallel Corpora’. In Corpora in Translator Education, F. Zanettin, S. Bernardini
& D. Stewart, 43–54. Manchester: St. Jerome.
Mossop, B. 1998. “The workplace procedures of professional translators”. Paper read at the EST
Conference in Granada.
PACTE. 2003. “Building a Translation Competence Model”. In Triangulating Translation: Pers-
pectives in process oriented research, F. Alves (ed.), 43–66. Amsterdam: John Benjamins.
Rodríguez Inés, P. 2008. Uso de corpus electrónicos en la formación de traductores (inglés-es-
pañol-inglés). PhD thesis. Departament de Traducció i d’Interpretació. Universitat Au-
tònoma de Barcelona.
Stewart, D. 2001. “Poor Relations and Black Sheep in Translation Studies”. Target 12(2): 205–
228.
Varantola, K. 2003. “Translators and disposable corpora”. In Corpora in translator education,
S. Bernardini, D. Stewart, F. Zanettin (eds), 55–70. Manchester: St. Jerome.
Zanettin, F. 2001. “Swimming in words: Corpora, translation, and language learning”. In Learn-
ing with corpora, G. Aston (ed.), 177–197. Bolonia: CLUEB.
Zanettin, F., Bernardini, S. and Stewart, D. 2003. Corpora in Translator Education. Manchester:
St. Jerome.
Using corpora and retrieval software
as a source of materials
for the translation classroom
This article starts from a twofold distinction: that between corpora as documen-
tation tools and corpora as a source of materials for the translation classroom,
and that between corpus-based and corpus-driven approaches. Then a pedagog-
ic framework for translator training is outlined in which the notion of objective
is central and a task-based methodology is used. Within such a framework, four
kinds of corpus-related tasks are presented and illustrated: cloze tests based on
a bilingual corpus, multiple choice exercises based on a learner corpus, transla-
tion of short passages yielded by the concordancer and concordance analysis.
The first three are corpus-based, whereas the last one is more corpus-driven and
can be used to promote autonomous learning and discovery strategies.
* Research for this article has been conducted within the framework of two research projects:
HUM2006-11524/FILO, funded by the Spanish Ministry of Science and Innovation (with a
contribution from FEDER funds), and P1 1B2006-13, funded by the ‘Caixa Castelló – Bancaixa’
Foundation, as part of an agreement with the Universitat Jaume I.
10 Josep Marco and Heike van Lawick
and controls their use with a view to achieving their pedagogic objectives. As
claimed by Bernardini, Stewart and Zanettin (2003: 4):
The use of corpora in language learning contexts was pioneered by Tim Johns,
who introduced concordancing into the foreign language classroom in the
1980s. Besides enabling language professionals such as lexicographers and mate-
rial writers to produce better reference and learning materials, and allowing lan-
guage teachers to create classroom activities based on real examples, he showed
how corpora could provide learners with direct access to virtually unlimited
language data.
literary translation, with English and German as source languages and Catalan as
target language. Therefore, we will not be dealing with such problems as subject-
specific terminology or specialized genre conventions. However, before present-
ing the activities, let us look briefly at the pedagogic assumptions underlying our
proposal.
Within the field of translator training, Delisle (1980, 1993, 1998) has laid great
emphasis on the importance of the notion of learning objective when planning a
translation course. Hurtado (1999, 2001) subscribes to Delisle’s view and goes on
to identify four groups of objectives that must inform a general translation course
(2001: 167): methodological, contrastive, professional and textual. Methodologi-
cal objectives have to do with the principles guiding the translation process; con-
trastive principles are related to basic contrastive features between the two lan-
guages involved; professional objectives take account of the skills the prospective
translator needs to have with a view to their insertion into the marketplace, i.e.
their becoming a member of the professional community to which they aspire to
belong;1 finally, textual objectives deal with the kinds of problems that arise in the
1. This is in line with Kiraly’s social constructivist approach to translator education, accord-
ing to which students at the periphery of the translation community are gradually drawn into
the community’s discourse until they are competent, full-fledged members of the community
themselves (Kiraly 2000).
2. As claimed by Kelly (2005: 12), “Delisle’s translational approach is informed by the théorie
du sens, and also partly by the Canadian contrastivist tradition of Vinay and Darbelnet, despite
his criticism of their work”.
Corpora as source for the translation classroom 13
Students are provided with the source text and the target text with gaps that they
are asked to fill in (see, for instance, Frankenberg-Garcia and Santos 2003). These
gaps will be concerned with the problematic issue that the teacher wants the stu-
dents to deal with. The main advantage of this kind of exercise is that it allows the
class to focus on a specific translation problem, leaving aside all other aspects of a
text which, interesting as they may be, are perceived at a given moment as periph-
eral to the issue in hand. Cloze tests, needless to say, can never be the main kind
of activity carried out in a translation class, as their nature is obviously reduction-
istic; but this weakness becomes their main strength when they are regarded as a
task which enables the class to concentrate on a given translation problem in an
intensive way (see Lawick 2006).
Appendix 1 provides two tasks dealing with the German conjunctions4 als
and wenn. Given the semantic complexity of these conjunctions, in the first task
students are asked to identify their different values and functions in different con-
texts. Samples have been selected from the monolingual German corpus Wort
schatz-Portal,5 compiled at Leipzig University and representing real situations
of use of today’s German. This large on-line corpus can be easily accessed and
handled, allowing students to obtain linguistic information without employing
much time and effort and encouraging them to work autonomously and discover
by themselves that corpora offer more (and different) information than dictionar-
ies. Therefore, they are asked to look for further examples in that corpus, before
carrying out the cloze test.
The central task presents a cloze test (with Joseph Roth’s Die Flucht ohne Ende
as source text and its Catalan translation, La fugida sense fi, as target text, both
belonging to the COVALT corpus)6 in which students are expected to fill in the
gaps in the target text corresponding to the clauses introduced by als and wenn in
the source text. Thus, learners will apply what they have learned in the previous
4. Although generally the terms connective and conjunction are used as synonyms in current
grammars, we prefer the latter according to the criterion followed by the Duden (2001), where
Konjunktion or conjunction is used meaning a lexical element connecting clauses, phrases,
words or constituents of a phrase or a word.
5. The Wortschatz-Portal <http://wortschatz.uni-leipzig.de/> contains 6 million words and
offers not only a list of concordances, but also information on significant neighbours and
graphical representations of co-occurrences.
6. COVALT (Corpus Valenciano de Literatura Traducida, or “Valencian Corpus of Translated
Literature”) is a multilingual corpus – still under construction – made up of the translations into
Catalan of narrative works originally written in English, French and German published in the
autonomous region of Valencia from 1990 to 2000. It currently includes 70 pairs of source text
+ target text which amount to about 4 million words. Corpus analysis is carried out by means of
AlfraCOVALT, a bilingual concordancing programme developed within the COVALT research
group by Josep Guzman (see Guzman, forthcoming). The COVALT group, based at Universitat
Jaume I (Castelló, Spain), has received financial support from several research projects, funded
by the “Caixa Castelló-Bancaixa” Foundation (within an agreement with Universitat Jaume I),
and by the Spanish Ministry of Science and Technology.
Corpora as source for the translation classroom 15
exercise, with the help of the translated context. But let us see why we consider
these conjunctions as a translation problem.
The conjunctions als and wenn have been singled out for study because stu-
dents, unaware of their polysemy, tend to translate them automatically into Cata-
lan as quan (“when”) and si (“if ”), respectively. In fact, Ainaud, Espunya and Pujol
devote a section (2003: 198–201) of their English-Catalan translation handbook
to the problems arising in the translation of connectives, starting with their often
polysemous character. But the German conjunctions als and wenn pose problems
to translator trainees not only on the grounds of their polysemous character, but
also of their partial overlap in temporal meaning. Actually, this overlap results
in pragmatic ambiguity, insofar as only the context can determine which mean-
ing prevails in each case, a phenomenon closely related to polysemy, linguistic
change, and grammaticalization (Sweetser 1990). In what follows, a brief descrip-
tion is provided of the main functions of the German conjunctions als and wenn,
as a backdrop to the tasks in Appendix 1.
The polysemy of als lies mainly in its double role as a temporal and as a modal
conjunction. In its latter function it may introduce a subordinate clause, a part of
a sentence or a word; furthermore, it may occur in complex expressions (sowohl...
als auch; insofern, als; zu... als dass). In most cases the modal als introduces a
comparison or a specification.
In its temporal sense, als usually introduces a subordinate clause referring to
events that (a) occurred in the past simultaneously with the events expressed in
the main clause, indicating a certain point in time; (b) occurred in the past before
the events expressed in the main clause, and c) occured in the past after the events
expressed in the main clause.
The main values of wenn are time and condition, two values that are closely in-
terrelated (Drosdowski et al. 1984: 700), not only in German. The use of temporal
notions like before and after in order to define more abstract notions like cause and
effect has been observed by several scholars (Drosdowski et al. 1984: 697; Cuenca
1992–1993 and 1999: 173; Pérez Saldanya and Salvador 1995: 91). The conjunction
nachdem, for example, has a temporal and a causal meaning, although the latter is
no longer used in standard German. For Catalan, Salvador (2002: 2989) highlights
the semantic proximity between certain causal and conditional clauses, on the one
hand, and between the latter and temporal clauses, on the other.7
7. According to this author, the utterance Quan neva, la muntanya torna blanca (“When
it snows, the mountain turns white”) is equivalent to a generic interpretation of Si neva, la
muntanya torna blanca (“If it snows, the mountain turns white”) (Salvador 2002, Note 8).
16 Josep Marco and Heike van Lawick
for the change (or extension) from a temporal to a conditional value in the case of
the German conjunction wenn, as well as for the fact that the Catalan contextual
equivalent of wenn in its conditional meaning is often the temporal conjunction
quan. But the temporal sense of als is also normally translated as quan, which
contributes to make this kind of translation problem more complex. A further
value of wenn (usually in combination with auch) is concessivity, corresponding
to the Catalan encara que (“even if ”, “even though”).8 The relationship between all
these values has been previously worked out in class by using the Wortschatz-cor-
pus, thus enabling students to easily cope with the cloze test, which is intended to
ensure comprehension and allow them to deliberately solve this kind of transla-
tion problem. On the other hand, given the context, they are encouraged to look
for translations which they might not have thought of before.
Learner corpora (see, for instance, Aston 2000; Osborne 2000; Bernardini 2004
for the use of learner corpora in second language acquisition, or Bowker and Ben-
nison 2003, mentioned above, for learner corpora and translator training) can
be used to implement multiple choice exercises. Students are provided with the
source text and then some fragments from it and their corresponding translated
fragments from different translations; then they are asked to distinguish between
translations that are both correct and adequate, translations that are incorrect
and translations that are more or less correct but inadequate, for whatever rea-
sons. The rationale behind this is, as Pym (1992) suggests, that in translation it is
sometimes possible to tell right from wrong, but more often than not translation
“errors” are inadequacies rather than plain mistakes. The former are called binary
errors (it can be said that something is a mistake on the authority of grammar and
the dictionary); the latter are known as non-binary errors.
Appendix 2 illustrates this possibility in corpus-based task design. The origi-
nal excerpts are taken from Graham Swift’s novel Last Orders (Swift 1996), and the
translated excerpts from an ad hoc learner corpus made up of student translations
of an extended passage (about 300 words) of the novel. Of course the task, as
presented in the appendix, is not complete. For reasons of space, the extended
8. Salvador (2002: 2980) calls this kind of utterances condicionals concessives (“concessive con-
ditionals”) (b) situated between condicionals (“conditionals”) (a) and concessives pures (“pure
concessives”) (c): a) Si fa sol, l’excursió serà molt agradable (“If it’s sunny, the trip will be very
pleasant”); b) Encara que no faci sol, l’excursió serà molt agradable (“Even if it’s not sunny, the
trip will be very pleasant”); c) Tot i que no ha fet sol, l’excursió ha estat molt agradable (“Even
though it hasn’t been sunny, the trip has been very pleasant”).
Corpora as source for the translation classroom 17
passage in question is not included, as a result of which the short excerpts and
its multiple-choice translations are absolutely decontextualized. In the real class-
room situation where the task was carried out (within the framework of a literary
translation course at Universitat Jaume I), students were familiar with the novel
and, therefore, with the extended passage. This kind of activity is intended to en-
hance the translator trainee’s critical sense, as it forces them to consider different
translation possibilities and give each one of them its due, always in a reasoned
way. In this particular case, the focus is on questions of register and phraseology,
as the main characters in the novel are working-class Londoners whose speech –
rich in colloquial expressions and idiomatic turns of phrase – gives the text its
distinctive flavour.
The teacher selects different passages among the results yielded by the concor-
dancer for a given query and students are asked to translate them. The advantage
of this kind of exercise is again that it enables the class to concentrate on a spe-
cific translation problem through a battery of examples which the trainer deems
representative of the issue in hand. It would have been possible to implement this
kind of activity in the pre-corpus age, but then it would have been extremely time-
consuming to collect relevant examples manually.
Appendix 3 provides an example of this kind of “translation drill”, centred
upon the conjunctive meaning of English now. Samples are taken from the British
National Corpus, freely accessible on-line through SARA (a concordancing tool).
However, translation of (a selection of) query matches will be supplemented by
a preparatory task (identification of the different values and meanings of English
now, as in the task included in Appendix 1) and a final task (translation of a lon-
ger passage in which now, used as a discourse marker, can be seen at work in a
wider perspective). Thus the three-step progression (identification of the different
values and meanings of an adverb/connector ⇒ translation of at least one short
passage for each meaning identified ⇒ translation of a longer passage in which
one meaning of the adverb/connector is illustrated on a larger scale) allows the
translation class to move from the paradigmatic (possible values of a given item
as they would be found in a corpus-based dictionary) to the syntagmatic axis
(how the item in question is embedded in a given text, how it contributes to the
development of the text’s argumentative or expositive patterns, etc.) and there-
fore bridges the gap between cross-linguistic contrast and meaning or function in
context. Moreover, this textual justification ties in particularly well with a cogni-
tive one, as the progression envisaged allows an interplay between analysis (what
18 Josep Marco and Heike van Lawick
different meanings of now can you identify?) and synthesis (decide which mean-
ing is activated in this instance and translate in a contextually appropriate way).
As argued by Danielsson (Bernardini 2004: 18), “as the units […] get longer on
the syntagmatic scale, the paradigmatic choices tend to get fewer”.
The development of translation tasks focusing on different values and functions
of now was prompted by our awareness of the difficulties experienced by translator
trainees when faced with now as a discourse marker. Students are well acquainted
with the primary meaning of now, i.e. an adverb with present time reference, but
they often seem to be utterly unacquainted with the other uses of the word, which
results in incoherent renderings that are bound to distort the import of a given
argument. Therefore, the aim of the tasks in Appendix 3 is to make students aware
of the uses of now as a discourse marker, which (according to the 1995 edition of
the Collins Cobuild English Dictionary) could be paraphrased as follows:
a. “to indicate to the person or people you are with that you want their attention,
or that you are about to change the subject”;
b. “when they are thinking of what to say next”;
c. “to give a slight emphasis to a request or command”;
d. “to introduce information which is relevant to the part of the story or ac-
count that you have reached, and which needs to be known before you can
continue”;
e. “to introduce something which contrasts with what you have just said be-
fore”;
f. “You can say ‘now, now’ as a friendly way of trying to comfort someone who
is upset or distressed”.
All these values of now are identified by the Collins Cobuild English Dictionary as
belonging to pragmatics and as being typical of spoken and/or informal English,
even though some of the examples included in the tasks are taken from written
language. These examples illustrate the time adverb use of now and all its non-
temporal uses just mentioned except b and c. These uses sometimes overlap, but
they are all present to some degree in the corpus samples provided.
All of the tasks put forward so far (those included in Appendixes 1, 2 and 3) are
corpus-based, insofar as they deal with contrastive features and translation prob-
lems identified by the teacher and then incorporated into the classroom materials.
Their novelty lies in the fact that corpora (whether monolingual, parallel or learn-
er) and corpus interrogation tools are used, but they might equally have been
Corpora as source for the translation classroom 19
elaborated in a more traditional way, with hard copy support – only the whole
thing would have been maddeningly time-consuming. But, as suggested above,
translation tasks may also be corpus-driven, with a varying amount of guidance
on the teacher’s part. Let us consider some examples.
A parallel corpus such as COVALT lends itself to data-driven exploitation in
several ways. The most generic one would be the teacher encouraging students to
use it as a documentation tool for any equivalence-related problem not broached
by the bilingual dictionary. But the scope for autonomous research and – indeed –
discovery can be narrowed and made more specific by the teacher suggesting rich
points or problematic areas. In this respect, the learner may be prompted to dis-
cover what techniques are used by translators when faced with words or multi-
word strings which have no readily available equivalents, or what norms govern
translation decisions with regard to such problems as the translation of sub-stan-
dard forms and cultural elements.
As to the former, a case in point might be that of evaluative adjectives (Marco
2006), i.e. adjectives that convey a certain degree of evaluation on the speaker’s part
and therefore reveal their attitude towards something. These adjectives often have
a semantic spectrum which overlaps only partially with the semantic spectrum of
their closest equivalents in the receiving language. That would be the case of Eng-
lish adjectives grotty, chintzy or wan, for instance, and their Catalan or Spanish dic-
tionary equivalents. Another case in point is that of body language words (Marco
and Guzman 2007), i.e. words – normally verbs or nouns – that convey some sort
of bodily expression, such as stare, frown, shrug, sniff or gasp. Two aspects of this set
of lexical units become immediately apparent when regarded from a contrastive
viewpoint. Firstly, they show different degrees of lexicalization across languages:
whereas in English these actions are fully lexicalized, in Catalan they are conveyed
by means of more or less fixed collocations of the type arrufar les celles (“to wrinkle
one’s eyebrows”) or arronsar els muscles (“to contract one’s shoulders”). And sec-
ondly, the cultural import of the gestures or body motions differs. The results that
this kind of discovery procedures are likely to yield would undoubtedly go a long
way towards enhancing the learner’s awareness of translators’ creativity, as actual
translation solutions are often difficult to predict from the standpoint of dictionary
equivalents, and show a high degree of variation.
Let us look briefly at the example provided by the English adjective dull. The
main meanings of dull are paraphrased as follows by the Collins Cobuild English
Dictionary: “not interesting or exciting”; “not very lively or energetic”; “not bright”
(when applied to colours or light); “very cloudy” (when applied to the weath-
er); “not very clear or loud” (when applied to sounds); “weak and not intense”
(when applied to feelings); and “not sharp” (when applied to a knife or blade).
These meanings are variously reflected in the English-Catalan dictionary, and the
20 Josep Marco and Heike van Lawick
e quivalents provided by the dictionary typically find their way into the transla-
tion solutions yielded by the COVALT corpus. However, not all these solutions
had been foreseen by the dictionary, and it is precisely in that area of mismatch
between dictionary and actual practice (as witnessed by corpus query matches)
that the translator’s creativity can be seen at work. Here are some concordance
results that convey a certain degree of unpredictability on the basis of dictionary
information (back translations from Catalan are provided in brackets).
(1) De Quincey, Thomas. The Confessions of an English Opium-Eater / Confessions
d’un opiòman anglés.
should become an opium-eater, haguera acabat menjant Opi, probable-
the probability is that (if he is not ment somniaria bous (llevat que, de tan
too DULL to dream at all) he will soca, no somiara en absolut) (“unless
dream about oxen he is such a blockhead that he does not
dream at all“)
(2) Doyle, Arthur Conan. The Adventure of the Bruce-Partington Plans / Sherlock
Holmes i els plànols del Bruce Partington
The London criminal is certainly a El criminal de Londres és, ben segur,
DULL fellow, un individu poc espavilat (“not a very
sharp individual”)
I am DULL indeed not to have Sóc un vertader badoc per no haver-ne
understood its possibilities comprés les possibilitats (“I am a real
fool for not having understood the pos-
sibilities”)
(3) Conrad, Joseph. Typhoon / Tifó
give me the DULLest ass for a skip- , doneu-me per patró l’ase més ase abans
per before a rogue que un bergant (“give me the most fool-
ish ass for a skipper before a rogue”)
are perfectly standard Catalan – and they are representative of the English-Cata-
lan sub-corpus at large:
(4) London, Jack. White Fang / Clau blanc
But we AIN’T got people an’ money Perquè nosaltres no tenim família, ni
an’ all the rest, like him, diners, ni res de res, com tenia ell
They jes’ know we AIN’T loaded to Saben ben bé que no estem preparats
kill, per a exterminar-los
An’ I’ll bet it AIN’T far from five I em jugaria el coll que fa més de metre
feet long i mig de llargària
(5) London, Jack. The Cruise of the Dazzler / El creuer del Dazzler
I’m ‘Reddy’ Simpson, an’ you AIN’T Jo sóc Panotxa Simpson i no hauràs
licked the fambly till you’ve licked guanyat la família fins que no em der-
me rotes
Bli’ me, if ‘ere they AIN’T snoozin’, – Que em pengen si no estan roncant
(6) London, Jack. The Call of the Wild / La crida del bosc
You AIN’T going to take him out No deu voler soltar-lo ara, veritat
now
In the area of cultural elements, on the other hand, it might be interesting to find
out how street names, for instance, fare in translation. In English-Catalan transla-
tion, there is no generally agreed-upon way of dealing with such nouns as Street,
Avenue, Road, etc. when they are part of a proper noun; therefore they are some-
times translated (as carrer, avinguda and so on) and sometimes left untouched in
the target text. However it would be highly informative to determine quantita-
tively whether there is a prevailing tendency in contemporary translation practice
or not. This is just to illustrate how corpus evidence can help throw light on some
contrastive features and translation problems. Of course many more areas of dif-
ficulty could be illuminated in a similar way.
Finally, learner corpora might also be used in an inductive way as part of
comparable corpora including professional work. This would parallel the kind of
work envisaged by Bernardini (2004) for the language learner. According to this
author (2004: 19), who draws on previous work done in the area of Computer As-
sisted Language Learning,
if learners are presented with concordances showing the typical errors they (sta-
tistically) appear to make, and with similar textual environments where the same
structure is used appropriately, they may find it easier to become aware of more
or less fossilized characteristics of their interlanguage, thus potentially initiating
a process of knowledge restructuring.
22 Josep Marco and Heike van Lawick
It would be possible, along the same lines, to present the translator trainee with a
comparable corpus made up of two components: a learner component, consist-
ing of translations produced by trainees in an academic setting, and a profes-
sional component, consisting of published translations, i.e. the output of profes-
sional work. Such a corpus would be comparable in its target text component, as it
would place side by side the production of learners and professionals; but it would
be rendered still more useful if it was parallel as well, that is to say, by means of
the inclusion of the corresponding source texts. As argued by Bernardini for the
language learner, the translator trainee’s awareness of typical mistakes or errors
might very well be enhanced by comparison of professional and learner output.
4. Conclusion
The obvious conclusion to be drawn from this article is that translator training
can benefit a great deal from what corpora and corpus analysis have to offer. The
emphasis has been on corpus use and learning to translate (as opposed to learning
corpus use to translate). Furthermore, corpus resources have been envisaged –
mainly but not exclusively – as sources for classroom materials and activities rather
than documentation tools. Emphasis on the former is due to the fact that the latter
has attracted more attention so far, and we have sought to redress the balance as
far as possible with corpus-based tasks for the translation classroom; but we have
also pointed out ways in which corpus-driven activities could be implemented
(either inside or outside the class). Cloze tests, multiple choice exercises based on
learner corpora and translation of short fragments yielded by concordancers are
examples of corpus-based activities; concordance analysis based on COVALT, for
instance, can give rise to semi-autonomous or fully autonomous corpus-driven
tasks, only partly (if at all) guided by the teacher, focusing on particular problems
or sets of problems. The more highly skilled the student is in the use of corpora,
the more autonomous their work can be. However, this sort of skill – like many
others – is not acquired overnight, and it makes sense to move along the two axes
(corpus-based and corpus-driven work) simultaneously, so that the student can
gradually shift from the former to the latter. Only a certain degree of competence
on the trainee’s part can justify the claim that “[t]he greatest pedagogic value of
the instrument [i.e. corpora] lies, we suggest, in its thought-provoking, rather than
question-answering, potential” (Bernardini, Stewart and Zanettin 2003: 11).
Corpora as source for the translation classroom 23
References
Ainaud, J., Espunya, A. and Pujol, D. 2003. Manual de traducció anglès-català. Vic: Eumo.
Aston, G. 2000. “Corpora and language teaching”. In Rethinking Language Pedagogy from a
Corpus Perspective, L. Burnard and T. McEnery (eds.), 7–17. Frankfurt: Peter Lang.
Bernardini, S., Stewart, D. and Zanettin, F. 2003. “Corpora in Translator Education: An Intro-
duction”. In Corpora in Translator Education, F. Zanettin, S. Bernardini and D. Stewart
(eds.), 1–13. Manchester: St. Jerome.
Bernardini, S. 2004. “Corpora in the classroom: An overview and some reflections on future
developments”. In How to Use Corpora in Language Teaching, J. M. Sinclair (ed.), 15–36.
Amsterdam/Philadelphia: John Benjamins.
Bowker, L. and Bennison, P. 2003. “Student Translation Archive and Student Translation Track-
ing System: Design, Development and Application”. In Corpora in Translator Education,
F. Zanettin, S. Bernardini and D. Stewart (eds.), 103–117. Manchester: St. Jerome.
Collins Cobuild English Dictionary. 1995 [2nd edition]. London: HarperCollins.
Cosme, C. 2006. “Clause combining across languages. A corpus-based study of English-French
translation shifts”. Languages in Contrast 6(1): 71–108.
Cuenca, M. J. 1992–1993. “Sobre l’evolució dels nexes conjuntius en català”. Llengua i Literatura
5: 171–213.
Cuenca, M. J. 1999. Introducción a la lingüística cognitiva. Barcelona: Ariel.
Delisle, J. 1980. L’analyse du discours comme méthode de traduction. Ottawa: Editions de
l’Université d’Ottawa.
Delisle, J. 1993. La traduction raisonnée. Manuel d’initiation à la traduction professionnelle de
l’anglais vers le français. Ottawa: Editions de l’Université d’Ottawa.
Delisle, J. 1998. “Définition, rédaction et utilité des objectifs d’apprentissage en enseignement
de la traduction”. In Los estudios de traducción: un reto didáctico, I. García Izquierdo and
J. Verdegal (eds.), 13–43. Castelló: Servei de Publicacions de la Universitat Jaume I.
Deutscher Wortschatz, Universität Leipzig, Institut für Informatik: http://wortschatz.
uni-leipzig.de/.
Drosdowski, G. et al. (eds.). 1984. Duden. Grammatik der deutschen Gegenwartssprache.
Mannheim: Bibliographisches Institut.
Frankenberg-Garcia, A. and Santos, D. 2003. “Introducing Compara, the Portuguese-Eng-
lish Parallel Corpus”. In Corpora in Translator Education, F. Zanettin, S. Bernardini and
D. Stewart (eds.), 71–87. Manchester: St. Jerome.
González Davies, M. 2004. Multiple Voices in the Translation Classroom. Amsterdam/Philadel-
phia: John Benjamins.
González Davies, M. (coord.). 2003. Secuencias. Tareas para el aprendizaje interactivo de la
traducción especializada. Barcelona: Octaedro-EUB.
Guzman, J. (forthcoming). “El uso de COVALT y AlfraCOVALT en el aprendizaje traductor”.
In Actas del XXIV Congreso Internacional de la Asociación Española de Lingüística Aplicada
(AESLA).
Guzman, J. and Serrano, A. 2006. “Alineamiento de frases y traducción: AlfraCOVALT y el
procesamiento de corpus”. Sendebar 17: 169–186.
Helbig, G. and Buscha, J. 1989 [12th ed.]. Deutsche Grammatik. Ein Handbuch für den Auslän-
derunterricht. Leipzig: VEB Verlag Enzyklopädie.
24 Josep Marco and Heike van Lawick
1. Identify as many meanings of als and wenn as you can in the following
excerpts (temporal, modal, or conditional, establishing their specific role
in each case):
In diesen Tagen begann Tunda, alle unbe- En aquests dies, Tunda començà a apuntar
deutenden Ereignisse niederzuschreiben, es tots els esdeveniments indiferents, ________
war, als bekämen sie dadurch eine gewisse ____________________________________
Bedeutung. significat.
Wahrscheinlich wäre ich sehr glücklich, Probablement em faria molt feliç _________
wenn sie geruhen würde, mir einen Auftrag ______________________ manar-me algun
zu geben. encàrrec.
Es war der erste Satz, den sie direkt an Era la primera frase que m’ha dirigit
mich gerichtet hatte, und sie sah mich nicht directament, sense mirar-me, ___________
an, als wollte sie zu erkennen geben, daß ____________________donar-me a entendre
sie, auch wenn sie zu mir sprach, nicht que, _______________ parlava, no ho feia ni
gerade unbedingt und nur zu mir sprach. incondicionalment, ni exclusivament a mi.
Ich kann ihr auch Geld geben, wenn Du Puc donar-li diners, ____________________
willst, daß sie zu Dir kommt. reunir-se amb tu.
Der Fremde gab an, daß er im Jahre 1916 El foraster va declarar que, l’any l9l6_______
als österreichischer Oberleutnant in ein _____________________________ el van dur
sibirisches Kriegsgefangenenlager gekom- a un camp de presoners siberià.
men war.
1. Choose the best translation for each of the following excerpts and
motivate your choice. Say whether the other options are correct /
incorrect and / or more or less adequate.
1. So we shuffle on, down some steps and up some steps, past all these geezers made of stone,
lying face up, flat out, out for the count.
a. Ens vam posar a trescar, escales amunt i avall, passant per aquells vells xarucs fets de pedra,
ajaguts d’esquena tan llargs com eren, a punt per passar revista.
b. Continuem caminant a desgana, baixem i pugem escales, passem per davant de tots aquests
individus fets de pedra, gitats panxa enlaire, fatigats, fets pols.
c. Així que vàrem continuar deambulant, uns passos amunt, uns passos avall. Vàrem passar
tots aquests homenots fets de pedra que estaven gitats cap amunt, tots tirats i dormint com
un tronc.
d. Així que, arrossegant els peus, continuem caminant esglaons amunt i esglaons avall per
davant de tots aquests vells xarucs de pedra que reposen boca per amunt, fora de combat.
e. Aleshores continuem caminant arrossegant els peus, amunt i avall unes quantes passes, per
davant de tots aquests homenots fets de pedra que estan gitats cara amunt, molt cansats,
fora per al compte.
2. I reckon that’s when it really happened, that’s when we really parted company, though it
wasn’t till later, till she teamed up with that Tyson toe-rag, then started taking on all-comers,
that I washed my hands altogether, did a Vic.
Corpora as source for the translation classroom 27
a. Supose que quan va succeir és quan, en realitat, vam dividir l’empresa, ja que no va ser fins
més tard, fins que ella es va associar amb aquell fanfarró d’en Tyson, quan va començar a
veure-se-les amb tots els adversaris, que em vaig llavar les mans del tot igual que Vic.
b. Crec que va ser llavors quan va passar, quan realment ens vam separar, encara que no va
ser fins un poc més tard, fins que es va associar amb aquell pocavergonya de Tyson i van
començar a prendre part tots els contendents, que em vaig desentendre, vaig fer com Vic.
c. Crec que va ser aleshores quan va passar, quan ens vam separar de veritat, encara que no
va ser fins temps després, quan ella es va ajuntar amb aquell mamarratxo musculat al més
pur estil Tyson, quan començaren a esdevenir totes les adversitats del món. Però jo no vaig
rentar-me les mans, com va fer Vic.
d. Crec que tot ve d’aleshores, sí, va ser aleshores que ens vam distanciar de debò, tot i que no
va ser fins més endavant, quan es va ajuntar amb el poca-vergonya i després va començar
a ajuntar-se amb el primer que arribava, que vaig rentar-me’n les mans de tot plegat, mans
netes com Vic.
‘Show us the tap, and give us a bit of cold meat and a drop of beer while yer inquiring, will yer?’
said Noah.
28 Josep Marco and Heike van Lawick
Barney complied by ushering them into a small back-room, and setting the required viands be-
fore them; having done which, he informed the travellers that they could be lodged that night,
and left the amiable couple to their refreshment.
Now, this back-room was immediately behind the bar, and some steps lower, so that any person
connected with the house, undrawing a small curtain which concealed a single pane of glass
fixed in the wall of the last-named apartment, about five feet from its flooring, could not only
look down upon any guests in the back-room without any great hazard of being observed (the
glass being in a dark angle of the wall, between which and a large upright beam the observer
had to thrust himself), but could, by applying his ear to the partition, ascertain with tolerable
distinctness, their subject of conversation. The landlord of the house had not withdrawn his eye
from this place of espial for five minutes, and Barney had only just returned from making the
communication above related, when Fagin, in the course of his evening’s business, came into
the bar to inquire after some of his young pupils.
Dominic Stewart
University of Macerata, Italy
This paper discusses a module taught to final year students at the School for
Interpreters and Translators at Forlì, University of Bologna, examining the role
of semantic prosody in translation from English to Italian and the way in which
we as corpus analysts use our intuitions about language to seek insights into se-
mantic prosody and to convert corpus data into evidence of semantic prosody.
These issues are considered primarily from the point of view of a teacher of
translation wishing to sensitise students to the opportunities afforded by cor-
pora for translators.
Introduction
Yet, as Louw (1993) has pointed out, prosodies may be more subtle than this,
possessing an almost subliminal quality which may not be readily perceived by
the hearer/reader/translator. In this respect, corpora have been considered cru-
cial, enabling or helping us to identify, through corpus data, the lexical profile
of any given term. Since semantic prosody has been studied almost exclusively
within the domain of corpus linguistics, it seems legitimate in this context to raise
the question of just how corpora can provide translators with insights into se-
mantic prosody. Yet there is little research, within translation studies or corpus
linguistics, into how we as analysts actually read or interpret corpus data, i.e., how
we convert corpus data into evidence. In this regard, of considerable importance
is the role of the user’s intuitions in corpus investigations, particularly as the role
of intuition tends to be played down in studies on corpus linguistics, often consid-
ered speculative and untrustworthy by comparison with the tangible, empirical
world of computerised collections of data.
Two main issues thus emerge: the role of semantic prosody in the translation
process and how we intuitively convert corpus data into evidence of semantic
prosody. I shall consider these issues primarily from the point of view of a teacher
of translation wishing to sensitise students to the opportunities afforded by cor-
pora for translators.
The teaching in question, a module taught to final (fourth) year Italian stu-
dents at the School for Interpreters and Translators at Forlì, University of Bolo-
gna, involved corpus analysis of certain phrases and expressions drawn from a
passage by James Joyce, in order to identify possible semantic prosodies.
Section 1 of the present article furnishes a brief review of studies on semantic
prosody in both corpus linguistics and translation, while Section 2 outlines the
methodology adopted and gives the findings. Section 3 offers a discussion of the
methodology used, as well as critical reflections upon the way the corpus analysis
was carried out.
Over the last twenty years or so semantic prosody has aroused considerable atten-
tion within corpus linguistics. Interest in the subject was initially kindled in the
late 1980s by Sinclair’s observations about the lexico-grammatical environment
of the phrasal verb SET in, later reiterated in Sinclair (1991: 74). Using a corpus of
around 7.3 million words, the author makes the following observation about this
verb’s grammatical subjects:
Safeguarding the lexicogrammatical environment 31
The most striking feature of this phrasal verb is the nature of its subjects. In gen-
eral, they refer to unpleasant states of affairs … The main vocabulary is rot, decay,
malaise, despair, ill-will, decadence, impoverishment, infection, prejudice, vicious
(circle), rigor mortis, numbness, bitterness, mannerism, anticlimax, anarchy, dis-
illusion, disillusionment, slump. Not one of these is conventionally desirable or
attractive (ibid: 74–75).
Later in the same work the author (ibid: 112) notes, within the framework of his
idiom principle, that “Many uses of words and phrases show a tendency to occur
in a certain semantic environment. For example the word happen is associated
with unpleasant things – accidents and the like”.
Sinclair’s reading of semantic prosody is to be understood within his model
of the extended lexical unit, which integrates collocation, colligation, semantic
preference and semantic prosody. For example, in Sinclair 1996 (84–91) the au-
thor analyses the lexical items (a) the naked eye, for which he posits a prosody of
‘difficulty’ on account of its frequent co-occurrence with sequences such as barely
visible to the, too faint to be seen with, invisible to, and (b) true feelings, for which
he claims a prosody of ‘reluctance’, i.e., reluctance to express our true feelings, on
account of co-occurrences such as will never reveal, prevents me from expressing,
less open about showing, guilty about expressing.
The pragmatic implications of semantic prosody are made explicit in the fol-
lowing:
A semantic prosody…is attitudinal, and on the pragmatic side of the semantics /
pragmatics continuum. It is thus capable of a wide range of realisation, because
in pragmatic expressions the normal semantic values of the words are not neces-
sarily relevant. But once noticed among the variety of expression, it is immedi-
ately clear that the semantic prosody has a leading role to play in the integration
of an item with its surroundings. It expresses something close to the ‘function’
of an item – it shows how the rest of the item is to be interpreted functionally.
(Sinclair 1996: 87–88)
The term ‘semantic prosody’ itself first gained currency in Louw (1993), and was
based upon a parallel with Firth’s discussions of prosody in phonological terms.
In this respect Firth was concerned with the way sounds transcend segmental
boundaries. The exact realisation of the phoneme /k/, for example, is dependent
upon the sounds adjacent to it. The /k/ of cat is not the same as the /k/ of key,
because during the realisation of the consonant the mouth is already making pro-
vision for the production of the next sound. Thus the /k/ of cat prepares for the
production of /æ/ rather than /i:/ or any other sound, by a process of “phonologi-
cal colouring” (Louw ibid: 158). In the same way, it has been claimed, an expres-
sion such as symptomatic of (ibid: 170) prepares (the hearer / reader) for what
32 Dominic Stewart
Over the last few years there have been a number of studies on semantic prosody
within a contrastive framework. These include Xiao and McEnery’s (2006) com-
parison of prosodies of near-synonyms across English and Chinese, and Berber-
Sardinha’s (2000) analysis of English and Portuguese, both of which conclude that
collocational behaviour and semantic prosodies of near-synonyms are unpredict-
able across the two language pairs, in some cases being quite similar and in others
quite different. What emerges clearly from the two studies, however, is that such
phenomena should receive far more attention in pedagogy (language teaching,
translation teaching, dictionary compilation) than is currently the case. Similar-
ly, in Munday’s (forthcoming) cross-linguistic analysis of semantic prosodies in
comparable reference corpora of English and Spanish, the author advocates more
earnest collaboration between translation studies theorists, monolingual corpus
linguists and software developers. He also makes the important point that corpus
data are particularly useful to translators (in this case, to translators working into
their mother tongue) because:
Safeguarding the lexicogrammatical environment 33
the translator may be aware of the general semantic prosody of target text al-
ternatives (since these are in his/her native language) even if he/she may be less
sensitive to subtle prosodic distinctions in the foreign source language.
The module was taught to a class of 25 final-year students, all of whom were Ital-
ian. It consisted principally of the following:
– two lessons were devoted to textual analysis and discussion of a passage from
James Joyce’s story The Dead
– students were given a week to translate the passage into Italian
– following submission of their translations, a series of lessons were given on
semantic prosody and its possible relevance to the passage from Joyce, focus-
ing particularly, with the aid of corpus data, on three sentences or parts of
sentences from that passage
– students were then asked to re-translate the three sentences in the light of our
discussions of semantic prosody
– a comparison was then made of the translations ‘before and after’ the discus-
sions of semantic prosody, to see if awareness of prosodies had affected the
students’ translations in any way.
The passage analysed is the final part of Joyce’s The Dead, the last story in the
collection Dubliners. It is an extraordinarily lyrical, mournful, allusive passage,
heavy with symbolism, and this is in part why I chose it: aside from the beauty of
its language, its allusive nature seemed to provide a suitable springboard for the
analysis of semantic prosody. Further, the students were already studying Dublin-
ers as part of a literature course.
In the passage in question the character Gabriel is sitting in a dark hotel room
late at night after a party with family and friends in Dublin. His wife is asleep on
the bed while he reflects upon the past and present, and more specifically upon
the fact that many years before, his wife, as she has just revealed to him, had loved
34 Dominic Stewart
another man before she met Gabriel. The sentences analysed in detail in the class-
room are in bold:
The air of the room chilled his shoulders. He stretched himself cautiously along
the sheets and lay down beside his wife. One by one they were all becoming shades.
Better pass boldly into the other world, in the full glory of some passion, than fade
and wither dismally with age. He thought of how she who lay beside him had locked
in her heart for so many years that image of her lover’s eyes when he had told her
that he did not wish to live.
Generous tears filled Gabriel’s eyes. He had never felt like that himself towards
any woman, but he knew that such a feeling must be love. The tears gathered more
thickly in his eyes and in the partial darkness he imagined he saw the form of a
young man standing under a dripping tree. Other forms were near. His soul had
approached that region where dwell the vast hosts of the dead. He was conscious of,
but could not apprehend, their wayward and flickering existence. His own identity
was fading out into a grey impalpable world: the solid world itself which these dead
had one time reared and lived in was dissolving and dwindling.
A few light taps upon the pane made him turn to the window. It had begun to
snow again. He watched sleepily the flakes, silver and dark, falling obliquely against
the lamplight. The time had come for him to set out on his journey westward.
Once the students had submitted their initial translations, I introduced them to
the notion of semantic prosody, with the assistance of data from the British Na-
tional Corpus, using the three highlighted sentences / phrases. The methodology
I used was as follows:
The great majority appear to convey the idea of fear, for instance:
the lack of warmth in the smile chilled her. ‘I’ve given much
The grim hostility in his eyes chilled her. ‘OK, I’ll explain.
with a sinking feeling that chilled her more than any explosion
as something in his tone that chilled her even more. It was the
His threat was enough to chill her blood. ‘Poor Paige’. It became
minister’s mind and preferably chill his blood. It is possible to
and saw something that chilled his blood. Like some animated corpse
It took them unaware, chilling their blood. The reply, a weird
correctly Rachel Mortimer, chilled my soul to the marrow.) The
was so damned determined it chilled her to the very marrow of her bones
rope, the garrotte – they don’t chill my heart. Poison, however, is
Are you here? The thought chilled his mind – ‘Are you a ghost?’
aggressive determination which chilled her to the bone. ‘Take care’,
over like some succulent titbit chilled her to the bone. Moving on
doing that?’ The cool threat chilled her to the bone. She swallowed
around her like an icy fist, chilling her to the bone. There had to be
Concordance 1. ‘chill=VERB followed by my / your / his / her / its / our / their’. Span 5.
Non-random selection of 12/46
The verbs gelare, raggelare and rabbrividire frequently occur in expressions of fear.
My students suggested parallel examples such as mi ha gelato il sangue ‘it froze my
blood’; mi sono sentito gelare il sangue ‘I felt my blood freeze’; si sono sentiti rag-
gelare il sangue nelle vene ‘they felt their blood freeze in their veins’.
I selected this sentence for analysis because once again I had the impression that
despite its ostensible simplicity on both a semantic and syntactic level, there was
something sinister, almost threatening about it, yet I was unable to identify with
any degree of precision what caused this impression. Initially I made a number of
apparently fruitless searches, all of them variations upon ‘other forms were near’.
These were (=SUBST means ‘all forms of the noun’; =VERB means ‘all forms of
the verb’; an asterisk means ‘followed by’):
‘other forms’
‘other form=SUBST’
‘form=SUBST were’
‘form=SUBST * be=VERB’ (span 5)
‘were near’
‘form=SUBST * near’ (span 5)
‘were near.’
‘be=VERB * near’ (span 5)
This went on for some time until I hit upon the search ‘be=VERB * near.’ (span 5),
i.e., any form of be followed by near followed immediately by a full stop within a
span of 5 words. This resulted in 74 hits, of which around 20 looked interesting for
the type of investigation I was conducting (see Concordance 2).
Safeguarding the lexicogrammatical environment 37
everyday life for the Lord is always near. The parables of Jesus promise
gather now that the last days are near.’ He looked directly at Morrsleib
plan is dead and the end may be near. ‘We don’t have a chance’, said
coming of Christ the time of Christ is near. It means that it’ll come sort
than you hate me. My own death is near. I shall leave this ship and go
an eagle loses hope then death is near.’ Her voice faded as her body
brochure Proclaiming that the end is near. Black diplomats with stately
be found. Call upon him while he is near. Let the wicked forsake his way
in God’s presence. Our salvation is near. As we wait for Christ to be
Messiah and that the end of the world is near. Federal agents bathed the cult
Messiah and that the end of the world is near. The cult has been barricaded in
and that to see if anyone was lurking near. ‘Ambush – you know, surprise
the main post. The bombers were very near. A crash in the direction of the
too, knew that their last hour was near. Many of them were the scourings
ended up with a newspaper. The end was near. It came in March with the
new organs, Kelly realised her time was near. ‘I was driving Kelly to
dangerous of all possible enemies, man, was near. After ten minutes on
the King that the Day of Judgement was near. It was only after the
who, at 70, knew that his end was near. Mr Gorbachev, you feel, is only
mad certainty that a day of reckoning was near. And underneath these
It will be noticed that many of the grammatical subjects of BE refer to the end of
something, for instance the end of someone’s life, the end of the world, the end
of time. Left-hand co-occurrences include death, storm, dangerous, hate, lurking,
enemies, loses hope. More generally there is an archaic, biblical quality to the oc-
currences listed, with references to Christ, Jesus, salvation, the coming of Christ, let
the wicked forsake, the Day of Reckoning, the Day of Judgement, impending doom.
However, occurrences of this type represent only around a third of the total, so that
if one wished to suggest a prosody of doom or something similar, it would have to
be acknowledged that the prosody is not especially strong. At the same time, the
prosody connects powerfully with the lugubrious atmosphere not only of the final
pages of The Dead but also with the atmosphere of the story as a whole.
The other BNC searches carried out (listed above) in order to investigate
‘other forms were near’, though similar, produced quite different concordances.
The presence of the full stop after near, for example, was crucial, since without it
there was a high percentage of more banal occurrences such as ‘…near the library’,
‘…near the school’, or ‘near’ with a place name. Further, the ‘sinister’ cases tend
38 Dominic Stewart
to occur when BE is followed directly by near, i.e., when near is not qualified by,
for example, very, quite, reasonably, too etc., since the presence of a qualifier again
tends to produce more banal occurrences such as:
but most of the time, she’s fairly near. Efforts rewarded She will
weeks by then you Yeah yeah. It’s too near. So the thirtieth of September
of everyday life for the Lord is always near. The parables of Jesus promise
In their retranslations the group tried to account for the prosody discussed, fa-
vouring primarily
Altre forme/figure/sagome erano vicine,
in part because according to the students the syntax of this solution, with vicine
functioning as an adjective at the end of the sentence, recalls expressions such as
la morte è/era vicina ‘death is/was near’.
Some students tried to reproduce the archaic quality of the sentence by re-
placing vicino with accanto, which also means ‘near’:
Altre forme gli erano accanto
Accanto a lui vi erano altre forme
However, it was acknowledged that accanto was not ideal in that it appears to be
associated with pleasant rather than unpleasant states of affairs.
iii. The time had come (for him to set out on his journey westward)
The BNC data, 173 hits for ‘time has come’ and 91 hits for ‘time had come’ suggest
that the two expressions are associated with change, with personal resolve, with
a new beginning, but not noticeably with death or destiny. Having presented the
data to the class and discussed it in some detail, I then informed the students that
notwithstanding the corpus data, I remained convinced that the string ‘the time
had come’ in the passage examined is unsettling and ominous, in part because it
links up with the death-ridden implications of the “journey westward” (towards
the setting sun) and indeed of The Dead as a whole.
Many of the initial translations had included:
era arrivato il momento/il tempo…
era venuto il tempo/momento…
era arrivata l’ora…
In the retranslations there was a marked tendency to try to incorporate the sup-
posedly sinister associations of ‘the time had come’, in particular via the use of the
verb giungere, which was considered to convey a more archaic, biblical quality:
era giunto il momento/il tempo
era giunta l’ora
For reasons of time the module did not include work with corpora of Italian,
which might well have proved useful as a check on the students’ introspective
considerations about their native language, but the principal objective was to cre-
ate awareness of the potential usefulness of corpus data in their foreign language,
where their introspections would be less reliable.
3.1 Intuition
The final lesson of the module was devoted to a critical discussion of the corpus in-
vestigations conducted, i.e., how revealing the searches were in identifying seman-
tic prosody, and to what degree they helped to fill intuitive gaps in our knowledge
40 Dominic Stewart
of language. These are important questions because in corpus studies, above all in
the context of semantic prosody, intuition gets a bad press, often being described as
‘unreliable’, ‘inaccurate’, ‘chancy and unreliable’, ‘notoriously thin’ and ‘a poor guide’
(to semantic prosody). For example, Xiao and McEnery (2006: 103) begin their
study of semantic prosody in English and Chinese with the following premise:
We knew that our approach should be corpus-based as previous studies have
shown that a speaker’s intuition is usually an unreliable guide to patterns of col-
location and that intuition is an even poorer guide to semantic prosody.
In the remainder of this paper I shall consider the role of intuition in corpus
investigations with reference to the searches conducted above. Although my
students did not spontaneously dispute the way I had extracted and presented
the data – indeed they appeared to accept the validity of the corpus findings un-
questioningly – it could be argued that the searches entailed some highly suspect
empirical methodology.
– ‘chilled’
– ‘shoulders’
– ‘shoulders.’
– ‘chill=VERB
– ‘chill=VERB * his shoulders’ (For the queries including an asterisk (‘followed
by’) I am assuming a span of something like 4–5.)
– ‘chill=VERB * his’
– ‘CHILL=VERB * his/her/your [etc.] shoulders’
– ‘chill=VERB * his/her/your etc.’
– ‘chill=VERB * his/her/your etc. shoulder=NOUN’
– ‘chill=VERB * his/her/your etc. shoulder=NOUN.’
These are only the ‘one way’ queries, but any number of two-way queries would
also have been possible. Further, I investigated CHILL only as a verb: was I right
to exclude the noun (e.g., catch a chill) and the adjective (a chill wind)? And I took
no account whatsoever of the opening of the sentence – ‘the air of the room’. The
same goes for my searches ‘time has come’ and ‘time had come’. Why did I restrict
myself to the perfect and past perfect tenses? Why did I not extend the investiga-
tion to ‘my time came’, ‘my time comes’, ‘my time would come’, ‘my time would
have come’, or to progressive forms such as ‘my time is coming’, ‘my time has been
coming’, ‘my time was coming’ etc.? And why did I not look at negative and inter-
rogative forms? One must assume that on an intuitive level I considered these less
likely to produce interesting results.
It could thus be claimed that my searches were massively influenced by my
personal insights and geared from the outset.
me, and systematically eliminated anything which did not. Such decisions were
clearly based upon my intuitive reactions to the data.
Naturally I would not wish to suggest that all corpus analysts adopt the search
methodologies outlined above, but the investigations conducted raise some
points which are by definition germane to CULT conferences, i.e., to corpus use,
pedagogy and translation:
– corpus users should perhaps be wary of overplaying the empirical card and
relegating intuition to the wings of corpus investigations. It could quite jus-
tifiably be argued that almost all corpus investigations, from beginning to
end, are heavily reliant upon the user’s intuitions about language, determin-
ing why we decide to consult a corpus in the first place, how we formulate
our search, how we react to the data, the criteria we use to select or ‘reverse
select’ (eliminate) concordance lines, how far we take the investigation, why
44 Dominic Stewart
we terminate the investigation, and how we draw conclusions from the data.
Indeed, without users’ intuitions, one imagines that most corpus searches
would be relatively unproductive, if they managed to get off the ground at all.
Much has been made of the notion that semantic prosody is ‘invisible to the
naked eye’, covert, subliminal etc., but presumably there must be some sort of
intuitive trigger which activates the corpus search in the first place and which
determines how we handle the data. It is thus surprising that intuition is often
stigmatised as unreliable in corpus studies. For further discussion see Stewart
(forthcoming).
– teachers should be careful about how corpus data are presented to students. I
would underline once again that although I informed my (final-year) students
that I considered my own methodology to be questionable, and although I
urged my students to make criticisms of it, not one of them independently
came up with any cogent reason to dispute the validity of the procedures ad-
opted. It may be that generally speaking students continue to labour under
the delusion that the teacher is always right, but perhaps more insidious in
the current context is the delusion that the contents of a corpus, since they
are ‘real’, are always ‘right’. Students are likely to be so blinded by the extraor-
dinary abundance and availability of the data that it may not occur to them to
question the strategies used to reach and to interpret those data.
– learners, since by definition they have fewer intuitions about the foreign lan-
guage than native speakers of that language, are more likely to approach the
foreign-language data with an open mind, with a clean slate, so to speak, and
may thus have a better chance of letting the corpus data speak for themselves.
At the same time, if one takes the view that corpus investigations are inex-
tricably bound up with intuitive user behaviour, then it could be argued that
the fewer intuitions one has, the more problematic it is to make productive
searches. It may not occur to the learner, or to the non-native speaker in gen-
eral, that there could be anything of prosodic interest in expressions such as
‘chilled his shoulders’, ‘other forms were near’ and ‘the time had come’. If this
is the case, the learner will either not explore the prosody at all, or make blind
searches which could prove fruitless and frustrating.
– translators may find themselves in a quandary. If corpus investigations can
help translators, are we actually willing to be helped by them, or will we do
no more than select the data that best suit or confirm our own perceptions /
preconceptions? (see Tymoczko 1998: 657–658). Further, there is the risk that
corpus evidence of something as mercurial as semantic prosody may actually
dishearten translators, serving only to lend weight to the notion of a supposed
impossibility of translation. Take the prosody of ‘doom’ associated with BE *
near at the end of a sentence. We have only just taken on board the notion that
Safeguarding the lexicogrammatical environment 45
References
Ana Frankenberg-Garcia
Instituto Superior de Línguas e Administração, Lisboa,
Fundação para a Computação Científica Nacional, Portugal
1. Introduction
What translators should and what they shouldn’t do with texts has been a mat-
ter of controversy since Cicero (and later St. Jerome) first made reference to the
word-for-word versus sense-for-sense dichotomy. In recent years, however, there
has been a change of emphasis in translation studies away from the debate of
what translators ought to do and towards descriptive studies of what practicing
professional translators generally do. The shift of focus is beneficial to translator
education. Instead of being swamped with prescriptive dos and don’ts, trainee
translators who are made aware of regular features of translated texts can use
this knowledge to make their own conscious and informed decisions during the
translation process.
The present study uses corpus technology to revisit one of the more widely
discussed characteristics of translated texts: the phenomenon of explicitation.
48 Ana Frankenberg-Garcia
2. Explicitation
As Portuguese is marked for gender, the translator in example (1) was forced to
discriminate between a female and a male doctor. Obligatory explicitation can
also occur in the reverse direction. Example 2 illustrates two different aspects of
obligatory explicitation in the translation of Portuguese into English. First, while
the Portuguese possessive pronoun sua agrees with the object pele, the equiva-
lent her in English agrees with the subject. This means that while the Portuguese
reader has no means of telling that the skin in the text belongs to a female, the
English translator was forced to make the connection explicit. Second, since
. All examples were taken from the COMPARA corpus, Available at www.linguateca.pt/
COMPARA/. Letter and number codes identify source/translation pair plus alignment unit in
question.
Are translations longer than source texts? 49
ortuguese is a pro-drop language, the reader will read on and still not know
P
whether the person whose nose is ‘the most voluminous one in the world’ is a
man or a woman. As English is not a pro-drop language, the translator had to
insert the pronoun she, making it once again clear to the reader that the person
in question is a female.
(2) PBMR1 575
Source […] sua pele lembrava a crosta lunar e tinha o nariz mais
volumoso do mundo […]
Literally […] his/her skin reminded one of the lunar crust and Ø had
the most voluminous nose in the world […]
Translation […] her skin resembled the lunar crust and she had the most
voluminous nose in the world […]
As shown in example (4), exactly the same can occur in the translation of English
into Portuguese.
(4) EBDL3T2 799
Source “It’s probably Rummidge.
Translation – Então é provável que seja Rummidge.
Back Translation “So it’s probably Rummidge.
Voluntary explicitation is being used here as an all-embracing term that covers all
explicitation that is not obligatory, from the explicitation of syntactically optional
elements and markers of cohesion to the explicitation of cultural information. In
example (5), the translator made the interrogative form more explicit by adding
a question beginning that was not present in the source text, and used a footnote
50 Ana Frankenberg-Garcia
to add information about the use of a quote from Shakespeare and even about
Shakespeare’s birthplace.
(5) EBDL3T2 332
Source «All’s Well That Ends Well? » he snaps back, quick as a
flash.
Translation – Será que é All’s well that ends well ?* – ele diz rápido como
um relâmpago.
*Translation Note: Tudo está bem quando acaba bem é o
título de uma peça de Shakespeare, que nasceu em
Stratford-upon-Avon.
Back Translation Could it be All’s well that ends well? – he says quick as a
flash.*.
*Translation Note: All’s well that ends well is the title of a
play by Shakespeare, who was born in Stratford-upon-
Avon.
Similarly, in example (6), the translator added a subject and a verb which had
been implicit in the source text, introduced the first name of the poet referred to
only by his last name in the source text, and inserted a footnote to explain who the
poet was and the title of his great epic poem.
(6) PBAA2 47
Source Em pequeno meteram-lhe na cabeça vários trechos do Camões
[...]
Literally When young they put in his head various passages of Camões
[...]
Translation When he was young, someone had crammed various passages
of Luís de Camões into his head[...]*
*Translation Note: Luís de Camões (1524–80) – Portugal’s
national poet; wrote Os Lusíadas (1572).
holds that translations tend to be more explicit than source texts, regardless of the
increase in explicitness dictated by language-specific differences.
In the beginning of the nineties, Baker (1993) predicted that qualitative stud-
ies such as the above could be greatly enhanced by quantitative, corpus-based
analyses of translations. Indeed, Øverås (1998) examined explicitation and im-
plicitation shifts in the English-Norwegian Parallel Corpus, and found that there
was more explicitation than implicitation in both Norwegian translated from
English and English translated from Norwegian. Using two comparable corpora,
Olohan and Baker (2000) analysed the insertion of the optional that following the
reporting verbs say and tell in data from the Translational English Corpus (TEC)
and the British National Corpus (BNC), and found that the explicitation of that
is more frequent in the English translations from the TEC than in the English
originals from the BNC.
The present study is an attempt to analyse voluntary explicitation from the
perspective of text length. Because voluntary explicitation is generally achieved
by the addition of extra words in the translated text, this study seeks to test
whether translations are likely to be longer than source texts, regardless of the
languages concerned. Using the COMPARA corpus (Frankenberg-Garcia and
Santos 2003), the length of original English and Portuguese language literary
text extracts was compared with the length of their respective translations into
Portuguese and English.
. Available at http://www.linguateca.pt/COMPARA/.
52 Ana Frankenberg-Garcia
Portuguese translations contained 1% fewer words than their source texts in Eng-
lish. However, all these numbers can tell us is that translators working from Por-
tuguese into English will probably earn more if they base their fees on the number
of words in the translated text, while those working from English into Portuguese
might be better off if they get paid by the number of words in the source text. The
above distribution of words does not shed any light on the relationship between
translation and explicitation, for it is impossible to tell the extent to which the
differences observed are due to differences between Portuguese and English or
differences between source texts and translations.
Claims about the relative length of texts across languages are extremely difficult
to put to test. In a recent discussion on the Corpora List, there were over twenty
postings on the subject. The main problem seems to be that, because of the di-
verging lexico-grammatical characteristics of languages, it is complicated to de-
cide on what scale to use. Different measures will affect different languages dif-
ferently. If text length is measured in terms of number of words, for example, it is
not hard to see that whatever the criteria for counting words are, they might make
some languages seem lengthier than others. Table 2 illustrates this by means of a
few examples of how word processors count equivalent meanings in Portuguese
and English.
As can be seen, English allows for contractions like isn’t, which are not pos-
sible in Portuguese: não é. A word processor counts the former as one word
and the latter as two words. Even if contractions were to be counted as separate
words, however, there are other problems. For example, there are many com-
pound words in English, like teapot, which have to be written separately in Por-
tuguese: bule de chá. But then not everything in English is more economical than
in Portuguese. Portuguese clitics are often attached to verbs, making separate
words in English, like gave him, count as a single one in Portuguese: deu-lhe.
Also, because Portuguese is a pro-drop language, it is often the case that only one
word is required to say things that would take three or four words in English. For
. Available at http://helmer.aksis.uib.no/corpora/.
Are translations longer than source texts? 53
example, to ask the four-word question Did you like it? in Portuguese, only one
word is required: Gostou?
This is not the place for an extensive contrastive analysis of the lexico-gram-
matical characteristics of the two languages. The examples seen, however, show
that word counts per se are not enough to compare text length across languages,
let alone analyse the relationship between translation and explicitation. In fact, as
example (7) indicates, a translation can be more explicit than a source text even
when it has fewer words.
(7) EBDL1T1 670
Source What have I got to complain about? (7 words)
Translation De que me queixo então? (5 words)
Back Translation What have I got to complain about then?
Conversely, example (8) illustrates how there can be an increase in words in trans-
lation without any explicitation whatsoever:
(8) PBRF1 1299
Source Fui visitá-lo. (2 words)
Literally I went to visit him.
Translation I went to visit him. (5 words)
Some postings on the Corpora List argue that character counts constitute a better
measure for comparing text length across languages inasmuch as they disregard
the morphological and syntactic problems of word counts. However, as shown in
Table 3, equivalent meanings in two languages can also vary in terms of character
length. Differences in the number of characters in source texts and translations
cannot therefore help to analyse the question of explicitation any more than word
counts can.
Another method for comparing text length across languages suggested in the
discussion list is morpheme counts. Indeed, as can be seen in Table 4, counting
the number of morphemes of equivalent meanings in two different languages
does seem to flatten out many of the problems of word and character counts.
54 Ana Frankenberg-Garcia
However, morphemes are not only extremely difficult to count, but they are
also sensitive to obligatory lexico-grammatical differences between languages.
Thus in the examples given, teapot is made up of two morphemes, but its Portu-
guese equivalent, bule de chá, is made up of three because the preposition de has to
be inserted to link the nouns bule and chá. Likewise, the English sentence Did you
like it? has one morpheme more than its Portuguese equivalent Gostou? because
the English verb like has to be followed by an object, while its Portuguese equiva-
lent, gostar, doesn’t. As morpheme counts do no discriminate between the addition
of morphemes dictated by language specific differences and the extra morphemes
that are a product of voluntary explicitation, they too are not appropriate for ana-
lysing explicitation independently of the differences between languages.
Notwithstanding these limitations, the present study works on the assump-
tion that language-dependent biases can be controlled in bi-directional analyses.
In other words, when comparing source texts and translations to find out whether
text length increases in translation, it is assumed that an analysis of the transla-
tions from language y into language z combined with an analysis of the transla-
tions from language z into language y may shed some light on the extent to which
differences in text length are due to language-dependent factors alone. In other
words, if counting words, characters or morphemes can make texts in one lan-
guage seem comparatively shorter or longer, we believe this will affect both the
translations and the source texts of the language in question. A carefully balanced,
bi-directional sample of source texts and translations will therefore enable one to
filter out language-dependent biases, and find out whether translations are longer
than source texts regardless of the changes in text length dictated by language-
specific constraints.
Are translations longer than source texts? 55
Table 5. Source texts and translations selected for text length analysis
Text ID Author Translator
EBDL2 David Lodge M. Carlota Pracana
EBJB1 Julian Barnes Ana M. Amador
EBJT1 Joanna Trollope Ana F. Bastos
ESNG1 Nadine Gordimer Geraldo G. Ferraz
EUHJ1 Henry James M.F. Gonçalves
EBLC1 Lewis Carrol Y. Arriaga, N.Videira & L.Lobo
EBOW1 Oscar Wilde Januário Leite
EURZ1 Richard Zimler José Lima
PBPC1 Paulo Coelho Alan Clarke
PBMR1 Marcos Rey Cliff Landers
PMMC1 Mia Couto David Brookshaw
PPMC1 Mário de Carvalho Gregory Rabassa
PPSC1 Sá Carneiro Margaret J. Costa
PBAD1 Autran Dourado John Parker
PBMA3 Machado de Assis John Gledson
PPCC1 C. Castelo Branco Alice Clemente
each, which was the approximate size of the smallest source-text sample obtained.
This was achieved simply by cutting down on the number of concordances re-
trieved for each source text until what was left added up to or near 1500 words.
The next step was to count how many words there were on the translation side of
the parallel concordances. To be extra rigorous in the analysis, translators’ notes
were excluded from the study such that only the main translation texts were taken
into consideration in the word counts.
4. Results
The number of words in the 16 English and Portuguese source texts analysed
and the number of words in their corresponding translations into Portuguese and
English are summarized in Table 6.
According to the above figures, while five Portuguese translations had fewer
words than their corresponding source texts in English, the remaining eleven
translations (3 English>Portuguese and 8 Portuguese>English translations) were
all longer than their corresponding source texts. The figures also show that the
5. Conclusions
It is not uncommon to overhear in educated circles claims that some languages are
“wordier” than others, and that this is the reason why translations are longer or –
depending on the language direction – shorter than source texts. Trained transla-
tors should know better. An important goal of translator education is achieved
58 Ana Frankenberg-Garcia
when trainee translators become aware of the complexity of translation. This in-
cludes becoming aware of the reasons why text length can vary from source texts
to translations. As I hoped to have shown in this paper, the relationship between
translation and text length is not dictated just by the morphological and syntac-
tic differences between languages, and obligatory explicitation is something quite
different from voluntary explicitation. Translators who become aware of issues
such as these can make more conscious and more informed decisions during the
translation process.
Acknowledgements
Part of this work was done in the scope of the Linguateca project, jointly funded
by the Portuguese Government and the European Union (FEDER and FSE) under
contract ref. POSC/339/1.3/C/NAC
References
Baker, M. 1993. “Corpus linguistics and translation studies. Implications and applications.” In
Text and Technology: In Honour of John Sinclair, M. Baker, G. Francis and E. Tognini-
Bonelli (eds), 233–250. Amsterdam/Philadelphia: John Benjamins.
Blum-Kulka, S. 1986. “Shifts of cohesion and coherence in translation”. In Interlingual and In-
tercultural Communication: Discourse and Cognition in Translation and Second Language
Acquisition Studies, J. House and S. Blum-Kulka (eds), 17–35. Tübingen: Gunter Narr.
Frankenberg-Garcia, A. and Santos, D. 2003 “Introducing COMPARA, the Portuguese-Eng-
lish Parallel Corpus”. In Corpora in Translator Education, F. Zanettin, S. Bernardini and
D. Stewart (eds), 71–87. Manchester: St. Jerome.
Olohan, M. and Baker, M. 2000. “Reporting that in translated English: Evidence for subcon-
scious processes of explicitation?” Across Languages and Cultures 1(2): 141–158.
Øverås, L. 1998. “In Search of the Third Code: An investigation of norms in literary translation”.
Meta, XLIII, 4: 571–588.
Séguinot, C. 1988. “Pragmatics and the Explicitation Hypothesis”. TTR: Traduction, Terminolo-
gie, Rédaction 1 (2): 106–114.
Vanderauwera, R. 1985. Dutch Novels Translated into English: The transformation of a ‘minority’
literature. Amsterdam: Rodopi.
Vinay, J. P. and Darbelnet, J. 1958. Stylistique comparée du français et de l’anglais: Méthode de
traduction. Paris: Didier.
Arriving at equivalence
Making a case for comparable general reference
corpora in translation studies
Gill Philip
University of Bologna, Italy
1. Introduction
In translation studies, multiple corpora are used to study like with like across lan-
guages. This may be achieved with translation corpora, in which the texts are the
source language (SL) text in one corpus and translations of that text in the other(s);
or, as is now more frequently the case, by using comparable corpora, composed
of SL texts of a similar scope and content in each of the language(s) concerned.
The adoption of comparable corpora has made it possible to move away from the
study of translation as a product, and to focus instead on the identification, and
reproduction in translated texts, of norms proper to the Target Language (TL)
concerned. In other words, rather than studying previous translation choices (in
a translation corpus), comparable corpora reveal how the word, phrase or term is
60 Gill Philip
regarding the composition and size of the corpora, as well as their representative-
ness relative to their respective languages.
In an ideal world, all general reference corpora would follow the same design
criteria, making them of similar size, composed of similar text types in similar
proportions. However in real life several standard models coexist, and each has
its proponents and detractors. The Brown corpus (1 million words) and those
modelled on it, including Frown, LOB and FLOB, comprises text samples of equal
length, but as a corollary of this there are very few whole texts present in the data
set, which means that organisational features may not be adequately represented.
The British National Corpus (BNC) contains 100 million words of both spoken
and written texts (10% and 90% respectively) produced since 1964; sampling only
takes place in texts which exceed 45,000 words in length, and is intended to avoid
the risk of a single author’s idiosyncrasies skewing the data. In common with
the Brown corpus, the BNC is static, and can only be kept up-to-date through
re-issue. The corpus used in this study, the Bank of English, is a monitor corpus
which undergoes constant updating and expansion, and now comprises 450 mil-
lion words of running text.
Publicly-accessible corpora for Italian are few and far between. This study
draws on the Corpus di Italian Scritto (CORIS), which was the only Italian lan-
guage corpus available at the time when this research was being carried out. The
composition of the 80 million-word CORIS is modelled on the written compo-
nent of the Longman corpus of Spoken and Written English, making it qualita-
tively different from, as well as considerably smaller than, the Bank of English.
Table 1 gives an indication of the distribution of text types in the two corpora.
Given these differences, it might appear far-fetched to describe the Bank of
English and CORIS as comparable, but the most important consideration to bear
in mind is that they are large general reference corpora, not small, text-or genre-
specific comparable corpora. Dissimilar composition is not proof of incompara-
bility: in accepting that languages are anisomorphic, it should come as no surprise
Table 1. Proportions of text types in the Bank of English and CORIS
Text type Bank of English CORIS
misc. journalism 65.5% 47.5%
general prose 17% 25%
academic prose 1.5% 12.5%
legal prose – 10%
ephemera 1% 5%
spoken 15% –
that text types have different degrees of prominence and frequency of occurrence
in different cultural and linguistic contexts, and that decisions regarding the rep-
resentativeness of a general reference corpus for any language (or local language
variety) must take this fact into account. In fact, the decision to model CORIS
on LSWE was taken because its make-up was deemed more appropriate to Ital-
ian than the other contending models, most notably the BNC, LOB and Bank
of English (Rossini Favretti 2000: 51). General reference corpora are expected to
exemplify their languages in a balanced way, and as it is true that languages are
not translations of each other, so representative samples of those languages need
not mirror one another’s composition. Viewed in this light, then, the comparabil-
ity of the Bank of English and CORIS is not absolute, as two corpora constructed
to the same design specification, but rather relative, with each corpus being in-
dependently constructed to take account of language-specific features and thus
constitute a representative sample of the languages concerned.
Having justified the use of the term comparable, it is now necessary to fill in some
of the background to the applicability of comparable general reference corpora to
the translation examples to be examined in 4.1.1 and 5.1.
The research from which this paper is drawn, Philip (2003), is a corpus-driven
study of connotation in non-literary language. It examines the meaning of colour
words as found in conventional linguistic expressions such as to see red, to feel
blue, and green with envy, and explains what factors are responsible for activating
. Since the time this research was undertaken, the substantially larger Italian corpus com-
posed of texts from the Repubblica newspaper http://dev.sslmit.unibo.it/index.php has become
available to researchers (see Aston & Piccioni 2004). This corpus, taken in combination with
CORIS, would constitute a general reference corpus for Italian comparable with the written
component of the Bank of English in both size and content.
Arriving at equivalence 63
the connotative meanings of the colour words when the expressions are used in
running text. By comparing colour-word expressions with a number of near-syn-
onyms which display similar phraseological patternings (e.g. to catch red handed,
to catch in the act, to catch in flagrante delicto), it can be observed that the selec-
tion of one expression over another is largely predetermined by the situational
context which the language is describing, and is both predicted and constrained
by the regularity of patterning in the co-textual environment.
Although colour words are widely considered to be highly salient, Philip
(ibid.) demonstrates that when they occur as part of conventional, non-composi-
tional expressions, their meaning is subjected to the process of delexicalisation in
the same way as any other component of a non-compositional chunk, in accor-
dance with Sinclair’s (1991) idiom principle and Louw’s (2000) theory of progres-
sive delexicalisation. As a result, the metaphorical-connotative meaning potential
of these words remains latent, only rearing its head when the expressions undergo
creative variation.
When the canonical forms of conventional expressions are altered, the way in
which the phrase is interpreted changes radically, because the novel element has
to be integrated into the whole. In order to do this, the non-compositional phrase
is broken down into its component parts, and regains a degree of compositional-
ity. The meaning is then reprocessed to make the relationship of the novel element
to the underlying canonical form contingent; in doing so, meanings which are
normally delexicalised regain a degree of saliency and metaphorical life. This can
be observed by comparing the canonical forms in (1) and (2) with the creative
variants in (3) and (4).
(1) The gang was finally caught red-handed in an armed police ambush in
September 1992.
(2) At the time it was claimed Kerr had been caught red-handed trying to smuggle
arms to the Irish Republic.
(3) A car hi-fi thief was caught Simply Red-handed when he took a CD player into
a store owned by his victim.
(4) Mr. Green apparently had been caught scarlet-handed at his own blackmail
game. Pictures of him with Miss Scarlet were found hidden in Scarlet’s bed-
room.
A similar phenomenon takes place when the cotext includes an element that
favours a salient interpretation within the non-compositional phrase, as is the
case with (4) which includes a colour-word in the proper name, in addition to
the colour word component of blackmail. The proximity of colour words in the
64 Gill Philip
hraseological core and in the co-text causes the delexical colour word to be rel-
p
exicalised, thus re-activating the salient meaning.
In both these types of variation – phrase-internal, and phrase-external – the
chunk is read as a phraseological palimpsest, the sum of the underlying conven-
tional, delexicalised meaning and the novel, salient one that is superimposed on
top of it.
Conducting such a study with reference to two languages has translation as its ul-
timate aim. Monolingual reference corpora make it possible to identify the mech-
anisms which drive creativity in both languages concerned, and the results ob-
tained demonstrate that the changes in meaning are governed by the same general
principles – the combination of delexical meaning and a contextually-relevant,
salient add-on. This knowledge provides the basis for the informed translation
of unconventional language, especially that found in literature, journalism and
advertising, where word-play and anomalous language often falls victim to the
normalisation process in translation (Kenny 2001: 65–69).
Translation involves a great deal of choice, whether explicit or implicit. Choice
implies the selection of one interpretation, expressed by a particular sequence of
words, over other possible contenders, with the aim to achieve as close an effect as
possible to that obtained in the SL text. As Halliday puts it:
The translator is aware that a given item in the source has a set of possible equiva-
lents in the target language. [S/he is] aware that these are not free variants but
they are contextually conditioned. By ‘contextually conditioned’ I do not mean
that in a given context you must choose A and cannot choose B or C, but that if
you choose A or B or C then the meaning of that choice will differ according to
what the context is. (Halliday 1992: 16)
So what is the translator’s choice based on? Expert knowledge of the languages
provides a substantial degree of intuition regarding equivalence, and language
reference books and media fill in the gaps to a certain extent; but when the trans-
lator is faced with a range of apparently synonymous possibilities, how should he
or she proceed? No two expressions are identical in meaning and function, but
the fine details of the distinctions all too often escape our conscious knowledge.
In this case, the translation network comes into play. This need not involve
the use of corpora, although both translation and comparable corpora clearly add
detail which dictionaries and glossaries are not in a position to do. Reference to
corpus data makes it possible to identify where differences and similarities lie
Arriving at equivalence 65
across languages, thus fine-tuning the translator’s knowledge. But while interest-
ing as an academic exercise, the identification of exhaustive sets of equivalences
involves umpteen passages of translation and back-translation. The example pre-
sented in Váradi and Kiss (2001), based on a translation corpus, demonstrates
how cumbersome the procedure can be with new terms being added with each
passage from one language to the other. If the same procedure were carried out
using comparable corpora (domain-specific or otherwise representing a restrict-
ed range of the languages concerned), the resulting translation web would be less
messy and its realisation less onerous, if only because the language represented in
the corpora is more homogeneous than that found in general reference corpora.
Whichever source of data is used, the building up of translation networks
generally starts with a SL word form and a hypothetical translation in the TL (see
Tognini Bonelli 2002: 81–82 for details of this procedure using two monolingual
reference corpora, with the option of using a translation corpus, where available,
to enrich the process). As the SL word form’s patternings crystallise into function-
ally-defined “units of translation” (ibid. 80), each of these units must be matched
up with a unit in the TL. However, the unit in the TL may be a leaky correspon-
dence, either being too specific to cover all uses of the SL unit, or, conversely,
having a wider range of application than its SL equivalent. In the former case,
further units in the TL have to be identified; in the latter, the distinct senses in the
TL unit have to be matched to new units in the SL. The network, or “translation
web” (Tognini Bonelli 2001: 150–154), becomes more complex with every process
of translation and back-translation as more and more terms are added and con-
nected up with their equivalents in a potentially never-ending cycle.
One way to place a control on the network is to work on the basis that the SL
word is one of several members of a larger semantic set, and as such it is dis-
tinguished and distinguishable from a range of near-synonyms. In this way, the
analysis of the patternings of the SL term and its near-synonyms is carried out
before embarking on the translation process. If the same procedure is applied
to the posited TL equivalent term, i.e. that it is considered as a member of an
analogous paradigm, for which the various patternings have to be identified, then
the location of translation equivalents becomes a matter of matching up pattern-
ings, rather than searching for new expressions every time a new pattern appears.
. In particular, the reader is referred to the schematic representation of sorrow and its transla-
tion into German (Váradi and Kiss 2001: 169).
66 Gill Philip
The correspondences are more detailed and accurate, and the tangled web of
translations can be replaced by a more robust and linear schema of one-to-one
correspondences that are arrived at independently of which language is to be con-
sidered the source or the target.
. This represents the full paradigm of translations found in Ragazzini 1995. It should be
noted that this is not an exhaustive list of every possible comparable expression, and it excludes
paraphrasis.
. The analysis of this data set made it evident that semantic prosodies cannot be identified for
each node, but are specific to larger units of meaning which include the node and the particular
collocational patternings which form around it: for this reason they are identified last of all.
Arriving at equivalence 67
With its requirement for detailed analysis of members of a semantic set rather
than of a single term, the paradigmatic model may give the impression of be-
ing perhaps unnecessarily time-consuming, but it should be remembered that it
is proposed as an alternative to the existing – and considerably more onerous –
method involving successive stages of translation and back-translation. If the in-
tention is to compile some sort of translation database or to improve translators’
reference works, then the corpus approach gives the most comprehensive account
of how cotextual features contribute to the building up of meaning. It can provide
extremely detailed information about how the words in question combine, the
units of meaning that they generate, their textual positioning and their extra-lin-
guistic function; all these aspects are potentially necessary to the translator.
Word profiling of the sort discussed here can be done manually, automati-
cally or through a combination of both. The precise approach taken depends on
time available, the potential of the analysis tools, and indeed the corpus itself, as
some can only be interrogated through their built-in query software, which may
limit the degree to which the analysis can be automated. The profiling discussed
in this paper was carried out mainly by hand, the choice being determined by the
corpora used: both the Bank of English and CORIS are only available by remote
access, and can only be interrogated by their built-in query software.
Manual profiling is time-consuming, but generally highly accurate, as the
human analyst is able to recognise semantic relations between collocates more
easily than a computer can. Automatic profiling software is very sophisticated
and detailed, but is limited in the extent to which it can cope with semantic re-
lations. To date, no applications can go beyond taxonomic semantic relations,
. One such application is Sketch Engine (Kilgarriff and Tugwell 2002, Kilgarriff et al. 2004)
which runs on a variety of corpora in different languages http://www.sketchengine.co.uk/.
68 Gill Philip
i.e. hierarchical relations, lexical and semantic sets, to address the kind of ad hoc
relations which humans create and interpret freely, which are based on shared at-
tributes (see Glucksberg and Keysar 1993). When a regular pattern is found at the
abstract level of semantic preference rather than in the concrete realm of count-
able, recurrent word-forms, the ability to appreciate such relations is especially
important. Humans also find it easier to spot long-distance collocates (Seipmann,
2005), where the unit of meaning extends considerably farther than the extent of
the concordance line on the computer screen; they are also able to make sense
of incomplete text fragments or fractured phraseological patterns (Moon 1998)
such as humorous exploitations of idiomatic expressions where the original is
truncated or modified.
Manual profiling can of course be aided by corpus tools which guide the ana-
lyst towards particular patterns and phenomena. The “picture” option in the Bank
of English’s suite of tools (see Krishnamurthy 2000: 36–39) gives an overview of
collocational frequency between n–3 and n+3 around the search term, and was
used extensively in the analysis of the English data in this study; most PC concor-
dance packages now include very sophisticated tools for calculating collocations,
patterns, n-grams and so on by frequency. As noted above, frequency counts and
string-searches have their limitations, but they make initial profiling quick and
reliable, with manual intervention confined to verification, fine-tuning and trou-
ble-shooting.
In all attempts at pattern matching there will inevitably be some forms that ap-
pear not to have an equivalent, at least insofar as the paradigms studied are con-
cerned. This is the case with the sense of turn red that collocates with leaves and
berries. Although turn red can nearly always be translated as diventare rosso, this
form in Italian never occurs with plant collocates to give the meaning “ripening”,
nor do any of its near-synonyms. In such a case as this, a new, related paradigm
can be opened up for exploration (ripen and maturare, with their synonyms). On
the other hand, should no translation be found to be appropriate, then, as Baker
reminds us, “[a] certain amount of loss, addition, or skewing of meaning is often
unavoidable” (1992: 57). The recurring phrase arrossire fino ai radici dei capelli
(literally, “to blush to the roots of one’s hair”) is one such case in point. The trans-
lator should try to find an equivalent English expression (taking the verb as the
base from), or use a paraphrase; in either case, the choice must combine the sense
of blushing (including the semantic prosody), and the emphasis of extent: blush
deeply, go bright red, turn beetroot. A quasi-literal rendition would be marked
Arriving at equivalence 69
in English, and unless a particular effect was being sought, this would not be an
appropriate translation solution for a form which is unmarked in Italian. An un-
translated borrowing would only be appropriate if attention were deliberately be-
ing drawn to the Italianness of the original. A literal translation with gloss might
be appropriate in a commentary, but is unlikely to be so in narrative.
The adoption of a paradigm in translation adds a further degree of conscious-
ness to the translation process. The translator is able to enter into an awareness
of the language choices made by the author, and thus not only find the most ac-
curate translation, but also note the differences between this term and the oth-
ers which could have been used, but were not. This notion takes on particular
importance when the language being translated differs from the norm – either in
extreme cases such as the translation of poetry, or in the day-to-day inventiveness
that characterises normal language use. Peculiarities and deviations from the SL
norm can be assessed in relation to that norm and replicated in the TL, in full
consciousness rather than by mere instinct. This means that the translation can
match the effect of the original, because the mechanisms governing the effect can
be identified and reproduced.
Using general reference corpora as an aid to the translation process means
using data which makes it possible to assess and compare norms across the lan-
guages involved. Translated text does not fail utterly in this role, but it bears the
sign of translation choices already made – for good or ill. With normalisation
prevalent in the translation of (apparently) atypical language, it is useful to be
able to compare, as Kenny does (2001: 125ff.), translation corpus data with com-
parable corpus data, and to do so both for the SL and the TL.
Colour words are typically used in European languages to express emotional states,
mainly because there is a fairly transparent metonymical connection between, for
instance, adrenaline speeding up the flow of blood through the body, and the face
becoming flushed or red. So it is justifiable to expect that colour-word expres-
sions should be used to refer to the manifestation of emotion in several languages.
What may come as something of a surprise, however, is that the colour words
typically used are not necessarily the same. For example, within Europe, English
is odd in that it associates the colour green with envy, when other languages prefer
yellow, the colour of bile; and blue meaning depressed (yet grey depressing) is far
from universal. But the non-equivalences do not end here.
Casting aside any cultural reasons why colours and emotional states should
not correspond exactly (see Niemeier 1998; Philip 2003: 151–164), the fact
70 Gill Philip
r emains that there is a degree of language variation in this area, and that a trans-
lator should be in a position to address it appropriately. Consider the following
corpus extracts (5)–(7), in which rabbia (rage) is assigned different colours – nero
(black), viola (purple), and verde (green).
(5) Chi sta vicino al Castel de’ Britti, dice che è nero di rabbia, che sogna la rivin-
cita.
(6) È viola di rabbia, una furia scatenata
(7) Quando mia nonna le ha risposto: “Speriamo di no, altrimenti verrà fuori una
puttana come te!”, ho visto mia madre diventare verde di rabbia.
10. The single example of viola di rabbia was located on the Internet; there were no occurrences
in the CORIS data.
Arriving at equivalence 71
What are the implications of this for translation? If the corpus data shows that
nero is commonly used in this pattern but black is not, how should the transla-
tor proceed? At this stage the principles of delexicalisation in conventionalised
phraseology come back to centre stage. Nero di rabbia is unmarked, and the term
which is correspondingly unmarked in English is red with anger/rage. By match-
ing these expressions, the salient meaning of the colour word has to be ignored
in favour of the unmarked phraseological meaning, which in this case is equiva-
lent. The same is true for verde di rabbia, again an unmarked form. Should there
be text-internal reasons for considering the colour to be relevant, the translator
could use the alternative, livid; but if there are no special circumstances to take
into consideration, then again red would serve to translate verde. The anomalous
viola di rabbia would be inaccurately rendered by an unmarked form such as
purple with rage, so some alternative rendering would be desirable; the most likely
course of action would be to move away from the basic colour terms (Berlin &
Kay 1969) and select a particular shade such as plum, puce or even regal purple.
In doing so, the colour is perceived accurately, but the effect of the SL original is
preserved because the phrase is not normalised.
Creative use of language may well make up only a small proportion of the lan-
guage that is translated every day, but it is important both culturally and linguisti-
cally for a translator to render it in an appropriate manner. If the innovative and
marked can be compared to related, unmarked forms in the SL, then the transla-
tor’s job is facilitated greatly. By considering conventional language as largely del-
exicalised, and innovative language as being a combination of a delexical support
and a contextually relevant addition, it is possible to go about achieving the same
effect in the TL by adhering to the same principles, essentially re-creating the TL
text in the same way as the SL text was constructed. In order to do this, however,
the translator must have access to a large quantity of data from which to identify
language norms, and that data comes in the form of general reference corpora.
Smaller corpora are simply inadequate when it comes to dealing with stretches
of text, fixed and semi-fixed phrases, and less-frequently used words and expres-
sions, though they serve a fundamental role in the identification of genre-related
phenomena.
6. Discussion
The use of comparable general reference corpora as an aid to the translation proc-
ess is often one-sided. TL corpora are often used as a control to ensure that the
translation produced sounds natural, but less use is made of corpora in assessing
the naturalness of the SL original. While it must be acknowledged that absolute
72 Gill Philip
References
Aston, G. and Piccioni, L. 2004. “Un grande corpus di italiano giornalistico.” In Atti del conve-
gno nazionale AitLA, G. Bernini, G. Ferrari and M. Pavesi (eds). Perugia: Guerra. Available
from http://www.sslmit.unibo.it/~guy/aitla_repubblica.htm (accessed 25 March 2008).
Baker, M. 1992. In Other Words: A Coursebook on Translation. London/New York: Routledge.
Berlin, B. and Kay, P. 1969. Basic Color Terms: Their Universality and Evolution. Berkeley: Uni-
versity of California Press.
Biber, D. 1993.“Representativeness in Corpus Design.” Literary and Linguistic Computing 8(4):
243–257.
Firth, J. R. 1968 “A Synopsis of Linguistic Theory, 1930-5.” In Selected papers of J.R. Firth 1952–
1957, F. R. Palmer (ed.), 168–205. London/Harlow: Longmans.
Arriving at equivalence 73
Glucksberg, S. and Keysar, B. 1993. How metaphors work. In Metaphor and Thought (2nd and
revised edition), A. Ortony (ed.), 401–424. Cambridge: Cambridge University Press.
Halliday, M. A. K. 1992. “Language Theory and Translation Practice.” Rivista internazionale di
tecnica della traduzione 0 (pilot issue): 15–25.
Kenny, D. 2001. Lexis and Creativity in Translation. A Corpus-based Study. Manchester: St.
Jerome.
Kilgarriff, A. and Tugwell, D. 2002. “Sketching words.” In Lexicography and Natural Language
Processing: A Festschrift in Honour of B. T. S. Atkins, Marie-Hélène Corréard (ed.), 125–
137. Göteborg: EURALEX.
Kilgarriff, A., Rychly, P., Smrz, P. and Tugwell, D. 2004. “The Sketch Engine.” In Proceedings of
the Eleventh EURALEX International Congress, 105–116. Lorient: Université de Bretagne-
Sud.
Krishnamurthy, R. 2000. “Collocation: From silly ass to lexical sets.” In Words in Context: A
Tribute to John Sinclair on his Retirement, C. Heffer and H. Sauntson (eds), 31–47. Bir-
mingham: The University of Birmingham.
Laviosa, Sara. 1997. “How Comparable Can ‘Comparable Corpora’ Be?” Target 9(2): 289–319.
Louw, W. E. 2000. “Some implications of progressive delexicalisation and semantic prosodies
for Hallidayan metaphorical modes of expression and Lakoffian ‘Metaphors we Live By’.”
Privately-distributed version of “Progressive delexicalization and semantic prosodies as
early empirical indicators of the death of metaphors”. Paper read at the 11th Euro-Inter-
national Systemic Functional Workshop: Metaphor in systemic functional perspectives,
University of Gent (Belgium), 14–17 July 1999.
Moon, R. 1998. Fixed Expressions and Idioms in English: A Corpus-Based Approach. Oxford:
Clarendon.
Niemeier, S. 1998. “Colourless green ideas metonymise furiously”. Rockstocker Beträge zur
Sprachwissenschaft 5, 119–146.
Philip, G. 2003. Collocation and Connotation: A corpus-based investigation of colour words in
English and Italian. PhD thesis. The University of Birmingham, UK. Available from http://
amsacta.cib.unibo.it/archive/00002266 (accessed 25 March 2008)
Ragazzini, G. (ed.) 1995. Il Ragazzini: Dizionario inglese italiano – italiano inglese (3rd edition).
Bologna: Zanichelli.
Rossini Favretti, R. 2000. “Progettazione e costruzione di un corpus di italiano scritto: CORIS/
CODIS.” In Linguistica e informatica: Corpora, multimedialità e percorsi di apprendimento,
R. Rossini Favretti (ed.), 39–56. Rome: Bulzoni.
Siepmann, D. 2005. “Collocation, colligation and encoding dictionaries. Part 1: Lexicological
aspects.” International Journal of Lexicography 18(4): 409–443.
Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: OUP.
Sinclair, J. M. 1996. “The Search for Units of Meaning.” TEXTUS 9(1), 75–106.
Tognini Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam and Philadelphia: John Ben-
jamins.
Tognini Bonelli, E. 2002. “Functionally complete units of meaning across English and Italian:
Towards a corpus-driven approach.” In Lexis in Contrast, B. Altenberg and S. Granger
(eds), 73–95. Amsterdam and Philadelphia: John Benjamins.
Váradi, T. and Kiss, G. 2001. “Equivalence and Non-equivalence in Parallel Corpora.” Interna-
tional Journal of Corpus Linguistics 6 (special issue): 167–177.
Virtual corpora as documentation resources:
Translating travel insurance documents
(English-Spanish)*
* The research reported in this paper has been carried out in the framework of the R&D
projects BFF2003-04616 (Spanish Ministry of Science and Technology/EU ERDF, 2003–2006)
and HUM-892 (Andalusian Ministry of Education, Science and Technology, 2006–2009).
76 Gloria Corpas Pastor and Miriam Seghiri
1. Introduction
Since the tourist industry is one of the principle driving forces behind the Spanish
economy,1 it is hardly surprising that there is a large demand for translations of
insurance policies in the tourism sector both from Spanish into English and from
English into Spanish (cf. ACT 2005). Although this economic reality could be
transitory, the rights of European consumers to demand translations of this type
of document under the auspices of European directives2 on insurance matters
and their respective national transpositions3 should also be taken into account.
These directives recognise the right of the party taking out insurance to receive a
contract4 written not only in the official language of the member state where the
agreement is made, but also in a language which they may specify. Subsequent
directives, such as 2002/92/CE,5 have also increased demand for translations of all
the formal documents that constitute the contract. In the following pages, we shall
1. Tourism is responsible for a huge volume of business in the international economy with
Europe occupying a privileged position at the top of the world scale. In 2006 Europe generated
$6,466.2 billion in this sector, equivalent to 10.3% of the world’s gross domestic product (GDP),
forecast to rise to 11% by 2011, accounting for 8.7% of total employment (cf. WTTC 2006a).
Also see studies by the WTTC concerning the United Kingdom (2006b), Ireland (2006c) and
Spain (2006d) for a more detailed analysis of the figures for these countries in this sector.
2. We refer to the Third EC Directive on Non-Life Insurance (92/49/EEC) and the Third EC
Directive on Life Assurance (92/96/EEC).
3. These transpositions, which are primarily aimed at consumer protection and fostering lin-
guistic plurality in Europe, are given expression, in the case of Spain, in the Ley 18/1997, de
13 de mayo, de modificaciones del artículo 8 de la Ley de Contrato de Seguro, para garantizar la
plena utilización de todas las lenguas oficiales en la redacción de los contratos, (BOE, 14th May
1997); in the case of the United Kingdom, in Statutory Instrument 2004, n.º 353. Insurers (Reor-
ganisation and Winding Up) Regulations 2004; and, finally, in the case of the Republic of Ireland,
in the Insurance Act 2000.
4. The policy (póliza, in Spanish) is the document which gives physical form to the insurance
contract. In addition, it is where the obligations and rights of both the insurer and the insured
person are set out, where the persons or objects that are insured are defined and the guarantees
and compensation in the case of damage are established. It also represents the formalisation
and culmination of the whole process of contracting the insurance. As a result, in many cases
the insurance policy may be referred to as the contrato (contract) (cf. Ley 50/1980; Insurance Act
2000; The Financial Services and Markets Act 2000).
5. We refer specifically to Directive 2002/92/EC of the European Parliament and of the Council
of 9 December 2002 on insurance mediation. In Article 13 of this directive, under “Information
conditions”, it is specified that “All information to be provided to customers in accordance with
Article 12 shall be communicated: (a) on paper or on any other durable medium available and
accessible to the customer; (b) in a clear and accurate manner, comprehensible to the customer;
Virtual corpora as documentation resources 77
guage Engineering for Translators Curricula), the use of corpora has only really
come to the attention of researchers working in the field of translation training
relatively recently. Examples of studies that stand out are: Kenny (2001) on the
subject of literary translation based on parallel corpora in German and English;
(c) in an official language of the Member State of the commitment or in any other language
agreed by the parties.”
6. There has been such a flood of compilers in Europe that we are forced to list only some of
the more important examples: ACL (Association for Computational Linguistics); ECI (European
Corpus Initiative); LDC (Linguistic Data Consortium); ICAME (International Computer Archive
of Modern and Medieval English); ACL/DCI (Association for Computational Linguistics Data
Collection Initiative) and ELRA (European Language Resources Association).
7. See <http://www.iai.uni-sb.de/docs/D3.pdf>. In their final report, which was presented to
the European Commission DG XII, the LETRAC project stressed the importance of introduc-
ing the following elements to the curriculum of translation degrees: applied IT, terminology
management programmes, CAT and AT systems, ICTs and linguistic engineering as well as
leaving time for publishing programmes, the Internet, controlled languages, project manage-
ment, translation memories and corpus linguistics.
78 Gloria Corpas Pastor and Miriam Seghiri
Corpas Pastor (2001, 2003b, 2004a, b and c) on legal and medical translations
based on multilingual corpora compiled from the Internet; and Sánchez-Gijón
(2003a: NP) on the subject of virtual ad hoc corpora for scientific translations in
the English-Spanish language pair. Other examples of studies are: Bernardini and
Zanettin (2000); Bowker and Pearson (2002); Zanettin, Bernardini and Stewart
(2003) on the possibilities offered by corpora for specialised language teaching.
Two studies that deal with the potential use of corpora in language teaching, natu-
ral language processing and translation are Aston (2001) and Granger and Petch-
Tyson (2003). Finally, in the R&D project described in Corpas Pastor (2003a) the
corpus was used as a fundamental documentation resource for the translation of
legal texts – this new venue of research was further developed some years later by
Seghiri (2006).
Both researchers and teachers are in agreement over the importance of
corpora in translation training and practice. Some authors have gone even fur-
ther and specifically indicate virtual corpora (cf. Pearson 1998; Bernardini and
Zanettin 2000; Corpas Pastor 2001 and 2004a; Zanettin 2002a and b; Sánchez-
Gijón 2003a and b) as one of the translator’s most important aids when faced with
a specialised text. By virtual corpus we refer to a corpus compiled from electronic
sources exclusively in order to carry out a specific translation in any direction (di-
rect, inverse or indirect8). Its principal objective is to construct a reliable resource
quickly and at minimal cost, based on texts mined from the Internet, to satisfy the
translator’s documentation needs.
Virtual corpora may also be referred to as ad hoc (Corpas Pastor 2001: 164;
Sánchez-Gijón 2003a: 3), disposable (Zanettin 2002a), do-it-yourself/DIY (Zanettin
2002a), domain-specific (Corpas Pastor 2004a: 226), web (Fletcher 2004), electron-
ic (Corpas Pastor 2001; Varantola 2003), ephemeral (Corpas Pastor 2004a: 226),
precision (Varantola 1997); and special purpose (Jennifer Pearson 1998; Sánchez-
Gijón 2003a).
Translators turn to the Internet in search of solutions to information and doc-
umentation problems because they are not only translating between languages
(for which a good dictionary, whether online or not, would suffice), but also
between discourse communities or cultures. In this context, the compilation of
corpora and the Internet appear to be two of the most important documentation
resources in the practice and research of specialised translation. When facing this
8. A “direct translation” is translation done directly from the original into translator’s na-
tive language, without an intermediary text; an “inverse translation”, also called “other tongue
translation (OTT)”, is a translation from the translator’s native language into another language;
finally, an “indirect translation”, also denominated “mediated translation”, is a translation done
via an intermediary translation in a third language, not directly from the original.
Virtual corpora as documentation resources 79
kind of assignment, the main problem that translators come up against is that a
corpus for the particular speciality is not available for consultation on the Internet
or, if one already exists, it often does not cover all the information requirements of
the source text. In other words, “one problem with these typically small and do-
main specific corpora is the limited range of topics and text types for which they
are available” (Zanettin 2002a: NP). Faced with this situation, translators have no
alternative other than to compile their own virtual corpora for the specific trans-
lation that has been commissioned in each case.
It is also important to take into account that any set of texts does not, in and
of itself, constitute a corpus. In order for a collection of texts to be considered a
corpus in the strict sense of the term, it must meet a set of clear design criteria
and abide by a specific compilation protocol so that the collection may be deemed
representative of the field of specialisation or the particular type of document that
is being translated.
In this section we will outline the design parameters that the creation of a virtual
corpus demands. Following this we will propose a compilation protocol in the
form of guidelines. This consists of four distinct phases: (1) locating and accessing
resources, (2) downloading data (3) text formatting and (4) data storage.
9. Another document is the duplicado de la póliza (a duplicate of the policy), which is drawn
up in writing by the insurer if requested by the person who takes out the insurance, the insured
80 Gloria Corpas Pastor and Miriam Seghiri
up in Spain, the Republic of Ireland and the United Kingdom (Scotland, Wales,
England and Northern Ireland). In addition, it will be necessary to compile a
comparable corpus, made up of two subcorpora, one in Spanish and the other in
English, which will include the original texts of the tourism contracts. This will
be a textual corpus, i.e. a full-text corpus, since it will include complete texts, and
a specialised corpus, in the sense that it includes specific text types dealing with
communication between specialists and semi-specialists or laymen.
A travel insurance corpus compiled in accordance with these design criteria
will be essentially unbalanced,10 since quality takes priority over quantity (Corpas
11
Pastor 2004a: 236) in this type of virtual corpus which has been compiled ad hoc.
It is, however, extremely homogenous given that it has been created for a specific
purpose.
Once the preliminary design parameters have been established the translator-
compiler should follow a protocol for the creation of the corpus comprising four
stages which will now be described.
person or the beneficiary. The insurer is obliged to provide a duplicate or copy of the policy if
the original is mislaid, the copy must be identical and have the same validity as the original. In
addition, there is also a document known as the boletín de adhesión (a joining form), a docu-
ment which gives proof of the insurance and has not been included here because it only applies
to life insurance policies.
10. Unlabaced because of the distribution of languages on the Internet. According to the “Top
Ten Languages Used in the Web (November 2007)” published by Internet World Stats (http://
www.internetworldstats.com/stats7.htm), the Spanish language represents 9.0 % of all the In-
ternet users in the world, while English represents 30.1 %.
Virtual corpora as documentation resources 81
We shall begin with an institutional search,11 one of the most productive types
12
of search for constructing corpora. This is due not only to the great quantity of
documents that these types of institutions, organisations or associations store on
the Internet today, but also because they can be assumed to be of a high standard
in terms of both quality and reliability because the writers are specialists in the
field. This institutional search will be mainly, though not exclusively, carried out
from institutional, regulatory and legislative sources. In order to locate legislation
the web sites and web pages that follow may be used.
In terms of official organisms and institutions, legislative information can be
taken from the headquarters of the ABI (Association of British Insurers),12 the 13
ABTA (Association of British Travel Agents)13 or the FSA (Financial Services Au-
14
thority)14 for the United Kingdom and Ireland. For Spain, information can be
15
mined from the Mesa del Turismo,15 particularly the section called “legislación
16
general” which includes regulatory laws and laws specifically related to the tour-
ism sector.
Another outstanding web site is that of the WTO (World Tourism Organisa-
tion)16 which contains one of the principal documentation resources for legisla-
17
tive material, Lextour.17 This is the WTO’s database of tourism legislation which
18
has links to web sites, databases, and external servers concerned with tourism
legislation set up by parliaments, governmental organisations, universities and
professional associations. We have also taken information from other databases
to obtain communitary legislation, such as the well respected Westlaw.18 However,
19
11. On numerous occasions, it may be necessary to perform a key word search to find the
names of more organisations to be used in the institutional search. This can usually be per-
formed by introducing descriptors together with Boolean techniques in a search engine such
as Google. For example, introducing organismo OR turismo, organismo AND turismo OR “or-
ganismo turístico” will increase the number of names of organisations connected with tourism,
whose web sites can then be visited in order to extract information that may be suitable for
inclusion in the travel insurance corpus.
12. Available at <http://www.abi.org.uk>.
13. Available at <http://www.abta.com>.
14. Available at <http://www.fsa.gov.uk/consumer>.
15. Available at <http://www.mesadelturismo.com>.
16. Available at <http://www.world-tourism.org>.
17. Available at <http://www.world-tourism.org/doc/S/lextour.htm>.
18. Available at <http://web2.westlaw.com/signon/default.wl?bhcp=1>.
82 Gloria Corpas Pastor and Miriam Seghiri
our most significant source has been EUR-Lex,19 the portal to European Union
20
law, which is currently the best database for European Union law.
Practically all the documents involved in the process of making a contract for
travel insurance may be found on the web sites of the big insurance companies. In
addition, although less frequently, the web sites of numerous online travel agen-
cies contain the texts of their policies, which they sell on from various insurance
companies, for their customers’ information. Similar rich sources of information
are also the web sites of international insurance companies such as Mondial As-
sistance20 or Europ Assistance,21 British and Irish insurance companies such as AT
21 22
Bell Insurance Brokers Ltd,22 Royal and Sun Alliance23 or Lloyds of London;24 or
23 24 25
directories. In this case, a problem with locating information may arise as a result
of the structure of the directories themselves which can even hinder the process
of documentation extraction.
Specialist directories stand out as excellent resources for locating commu-
nitary, national and autonomous legislation, especially when the resources they
contain are also evaluated and commented upon. This is the case for the compila-
tion of the Spanish subcorpus, using the section called “Dret” in the “Indices” of
directories of The Argus Clearinghouse31 and Search the Law32 (particularly the
32 33
lustrate how searches are made to locate the texts that will comprise the corpus.
In order to do this, the text types and the field of insurance in which the desired
information is to be found (travel insurance) are taken as descriptors and Boolean
search techniques are applied using the user friendly interface offered by, for in-
stance, Google’s advanced search.34 35
Table 1. Descriptors for the finding of the formal elements of travel insurance contracts
(Spanish).
Text type Descriptors Search equation
Póliza Póliza, seguro turístico, póliza AND “seguro turístico”
asistencia en viaje35 póliza AND “asistencia en viaje”
Solicitud Solicitud de póliza, seguro solicitud AND póliza AND “seguro turístico”
turístico, asistencia en viajeSolicitud AND póliza AND “asistencia en
viaje”
Propuesta Propuesta, proposición, póliza AND propuesta OR proposición “se-
seguro turístico, asistencia guro turístico”
en viaje póliza AND propuesta OR proposición “asis-
tencia en viajes”
Carta de Garantía Carta de garantía, seguro “carta de garantía” AND “asistencia en viaje”
turístico, asistencia en viaje “carta de garantía” AND “seguro turístico”
Table 2. Descriptors for the finding of the formal elements of travel insurance contracts
(English)
Text type Descriptors Search equation
Policy Policy, travel insurance policy AND “travel insurance”
Quote Quote, travel insurance Quote AND policy AND “travel insurance”
Proposal Form Proposal Form, travel insurance “proposal form” AND policy AND “travel
insurance”
Certificate of Certificate of Insurance, “certificate of insurance OR
Insurance Insurance Certificate, travel “insurance certificate” AND policy
insurance
for the United Kingdom and .ie for Ireland will therefore be of use. In addition pages in Spanish
with the domain .ar indicating Argentina, or .mx indicating Mexico and pages in English with
the domain .au indicating Australia or .us indicating the United States will be automatically
ruled out because they are not appropriate for our corpus.
35. We refer mainly to seguro turístico or travel insurance in accordance with the position
taken by Aurioles (cf. Aurioles Martín (2005 [2002]) y and Aurioles Martín et al. (2004) be-
cause we believe it to more accurate than the Spanish calque, asistencia en viaje of the original
English, since travel assistance is only one possible part of travel insurance which may also
include coverage for holiday cancellation or medical attention, to cite only some of the most
common examples. For a wider perspective on this question see the trilingual (Spanish-Eng-
lish-Italian) classification of travel insurance policies in relation to coverage outlined by Seg-
hiri (2006: 279–281).
Virtual corpora as documentation resources 85
The main difficulty with key word searches centres on the choice of the most pre-
cise descriptors for the intended search, given that without this a large amount of
irrelevant information will be returned. It is up to the translator-compiler to filter
out all this “noise” from each of the pages that will be included in the corpus.
ing in batches.
This downloading phase may be hampered by the inherent structure of the
Internet itself. On the one hand, we are faced with a mark-up language or HTML,
in other words, the information is organised in hypertext nodes which are often
difficult to access. This is usually as a result of the content being inappropriately
labelled or because the location of the information is difficult to see on the page.
On the other hand, the wide variety of formats that the information may appear
in should also now be considered.
completed by what might be called normalisation, since all the documents will
be converted to an ASCII or plain text format. In other words, they are stripped
36. This free software together with its instruction manual may be downloaded from the fol-
lowing web site: <http://www.gnu.org/software/wget/>.
37. A trial version of Solid Converter may be downloaded free of charge from <http://www.
solidpdf.com>. Given that it is a free trial version, it has a number of limitations: it only func-
tions for a two week period and permits conversion of a maximum of ten pages per document,
although it is possible to convert a complete text over a number of operations by specifying a
different set of pages each time. There are other free programs available online like Pdf to Word
converter 3.0 (<http://www.geomundos.com/descargas/bajar-pdf-to-word-converter-30_233.
html>), PDF Converter (<http://www.freepdfconvert.com/convert_pdf_to_source.asp>) or
Easy PDF to Word Converter (<http://www.pdf-to-html-word.com/ >), for instance.
86 Gloria Corpas Pastor and Miriam Seghiri
of the HTML or code of any other kind, in accordance with the clean-text policy
described by Sinclair (1991: 21).
38. On the subject of the legislative documents that form part of the corpus (17 texts in English
and 2 texts in Spanish) it is important to point out that travel insurance is not regulated by
substantive legislation. Instead it comes under the regulations that apply to all insurance other
than life insurance through various communitary directives such as 73/239/EEC, 73/240/EEC,
76/580/EEC, 78/473/ EEC, 84/641/ EEC, 87/343/ EEC, 87/344/ EEC, 88/357/EEC, 90/618/EEC,
92/49/EEC, 95/26/EEC, 2000/26/EC, 2000/64/EC and 2002/13/EC. In Spain, travel insurance
contracts are also currently regulated by the Ley 50/1980, de 8 de octubre, de Contrato de Seguro,
[Act 50/1980, 8th October, Insurance Contracts] as well as the Ley 30/1995, de 8 de noviembre,
de ordenación y supervisión de los Seguros Privados [Act 30/1995, 8th November, Planning and
Supervision of Private Insurance]. In Ireland, insurance contracts are regulated by the Insurance
Act, 2000, as well as the European Communities (Non-Life Insurance) Framework Regulations,
1994 (S.I. No. 359 of 1994). In the United Kingdom, they are regulated by the Financial Serv-
ices and Markets Act 2000 (Statutory Instrument 2003 N.º 1476), specifically Amendment, Nº.
2, Order 2003. In relation to policies, the central document in this type of agreement, it was
possible to include 101 documents (1,000,067 words) in the Spanish policies component and
176 documents (1,903,661 words) in the policies component in English. The remainder of the
formal elements of the contract are included in the rest of the corpus.
Virtual corpora as documentation resources 87
Sánchez Pérez and Cantos Gómez (1997). However, subsequently, some of these
authors, such as Cantos (Yang et al. 2000: 21), recognised some shortcomings in
these works, suggesting that they might be attributed to the use of Zipf ’s law.41 41
Zipf ’s law42 can give us an idea of the breadth of vocabulary used, but it is not
42
39. There are a surprising number of research projects that, whilst endeavouring to compile a
“representative” corpus, hardly seem to touch on this concept. Usually, it is noticeable that the
availability of material in the particular field of study determines the final size of the corpus
(Giouli y Piperidis 2002).
40. Indeed, out of this work came the rule known as Heaps’ law. Both Zipf ’s and Heaps’ laws
are used to grasp the variability of corpora: Heaps’ law is an empirical law which examines the
relationship between vocabulary size, or in other words, the number of different words (types)
and the total number of words in a text (tokens). In this way a sequential increase of vocabulary
in relation to text type can be observed. The programme ReCor has been validated using this
law (cf. Seghiri 2006: 399–403).
41. Conscious of these deficiencies, Yang et al. (2000) attempted to overcome them by taking
a new approach: a mathematical tool capable of predicting the relationship between linguistic
elements in a text (types) and the size of the corpus (tokens). However, at the end of their study,
the authors reflected on some of its limitations, “the critical problem is, however, how to deter-
mine the value of tolerance error for positive predictions” (Yang et al. 2000: 30).
42. For a historical perspective on how Zipf ’s law was developed see Moreiro González
(2002).
88 Gloria Corpas Pastor and Miriam Seghiri
Numerous studies have been based on the law, but the conclusions they reach
do not specify, not even through the use of graphs, the number of texts that are
necessary to compile a corpus for a particular specialised field (Almahano Güeto
2002: 281).
A possible solution could be to analyse the lexical density of a corpus in rela-
tion to the increase in documentary material included. In other words, if the ratio
between the actual number of different words in a text and the total number of
words (types/tokens) is an indicator of lexical density or richness, it may be pos-
sible to create a formula that can represent lexical density as the corpus increases
on a document by document basis: once a certain number of texts have been
included, the number of types does not increase in proportion to the number of
words the corpus contains.
This formula may make it possible to determine the minimum size that a
corpus must reach for it to begin to be representative. With the help of graphs,
it should be possible to establish whether the corpus is representative and ap-
proximately how many documents are necessary to achieve this. This theory has
become a practical reality in the shape of a software application, ReCor,43 which 43
ReCor’s interface is simple, intuitive and user-friendly (see Figure 1). Firstly, an in-
put file may be selected; this could be anything from a particular clause in a policy
43. ReCor is an acronym derived from the function it was designed for: the representativeness
of corpora.
Virtual corpora as documentation resources 89
to the entire corpus. There is also an option: “Filtro de entrada”, which filters out all
those words that the user wants to exclude from the analysis, like addresses, prop-
er names or even HTML tags, in the case that the corpus has not been “cleaned”.
Next, three output files are created. The first, “Análisis estadístico” or statistical
analysis, collates the results from two distinct analyses; firstly, with the files ordered
alphabetically by name and secondly with the files in random order. The docu-
ment that appears is structured into five columns which show the number of
types, the number of tokens, the ratio between the number of different words
and the total number of words (types/tokens), the number of words that appear
only once (V1) and the number of words that appear only twice (V2). The second
output file, “Palabras ord. alfa.”, generates two columns; the first shows the words
in alphabetical order with their corresponding number of occurrences appearing
in the second column. The same information is shown in the third file, “Palabras
ord. frec.”, but this time the words are ordered according to their frequency, or
in other words, by their rank. The application also allows the user to work with
groups of up to ten words (n-grams)44 and phraseology, as well as allowing num-
44
44. In this study we used the 2.1 version of ReCor. We are currently working on a new version
(ReCor 3.0) which has an improved capacity for working with multiple and very large files
quickly and also allows phraseological units to be identified on the basis of analysis of n-grams
(n ≥ 1 and n ≤ 10) of the corpus.
90 Gloria Corpas Pastor and Miriam Seghiri
45. It should be noted here that 0 (=zero) is unachievable because of the existence in the text of
variables that are impossible to control such as addresses, proper names or numbers, to name
only some of the more frequently encountered.
Virtual corpora as documentation resources 91
Extract 1 (ST):46 46
Important
This is your travel insurance policy. It contains details of cover,
conditions and exclusions relating to each insured person and is the
basis on which all claims will be settled.
46. The extract comes from a travel insurance policy from the British insurance company Direct
Travel Insurance: <http://www.direct-travel.co.uk/FAQ/Wordings/policywording010506.pdf>.
94 Gloria Corpas Pastor and Miriam Seghiri
Extract 2 (ST):47 47
CONDICIONES GENERALES
Artículo Preliminar.-El Contrato de Seguro.-El presente Contrato
de Seguro se rige por lo dispuesto en la Ley 50/1980, de 8 de octubre,
de Contrato de seguro, en la Ley 30/1995, de 8 de Noviembre, de Or-
denación y Supervisión de los Seguros Privados.
Even two short ST fragments like those chosen in 5.1 offer abundant evidence to
argue in favour of the use of comparable corpora in the actual translation process.
We are mainly concerned with the terminological and phraseological needs of
translators, the extraction of conceptual or domain information, and the com-
parison of textual and discourse features in the source and target languages.
47. The extract comes from a travel insurance policy from Agrupación Astes, Seguro Turístico
published on the web site of the travel agents, Condor Vacaciones S.A: <http://www.special-
tours.com/ficheros/Seguro_Europa_ES.pdf>.
Virtual corpora as documentation resources 95
be pointed out that asistencia en viaje appears 107 times. This clearly demon-
strates the preference in Spanish for the English calque when drawing up this type
of document as well as the influence of English as the lingua franca par excellence
(often referred to as “international legal English”) and its impact on legislation in
the field of travel insurance in peninsular Spanish.
Similar problems arise for translators when faced with translating El Contrato
de Seguro (cf. Extract 2) into English as there appears to be two possibilities: as-
surance contract or insurance contract. A search for contract in the corpus reveals
a preference in English for contract of insurance (cf. Figure 8). In addition, when it
appears in this particular position in the text, a fixed expression (This is your con-
tract of insurance) can be identified which should be reproduced in translation.
48. The analysis of concordances was carried out using WordSmith Tools 4.0.
96 Gloria Corpas Pastor and Miriam Seghiri
The next problem that could arise for the translator is how to translate the
English cover, conditions and exclusions (cf. Extract 1) into Spanish. A search in
the Spanish corpus for the literal translation condiciones, coberturas y exclusiones
shows only one concordance. On this point it is important to remember that legal
Virtual corpora as documentation resources 97
language is characterised not only by its precision, but also by its formulaic and
extremely conservative style. The translator should be aware of the abundance of
verbose and often redundant phraseological units and other fixed expressions and
the archaic or conventional forms that these texts contain, often with the sole pur-
pose of making them appear more grandiose. Finally, the Spanish corpus revealed
that the term exclusiones is always found as part of the phraseological unit límites
y exclusiones (or, else, as garantías, límites y exclusiones), as can be inferred by the
results presented by the program when writing exclusiones (cf. Figure 9).
98 Gloria Corpas Pastor and Miriam Seghiri
Once all the necessary information has been gathered from the travel insurance
corpus, the translator is in a position to offer a translation of both extracts. It is es-
sential to take into account all the points that have been outlined so far given their
importance when it comes to segmenting and reorganising the information in the
target text (TT). The following are suggested translations of Extracts 1 and 2.
Extract 1 (TT):
MUY IMPORTANTE
Esta es su póliza de asistencia en viaje. En ella se incluyen las
garantías, límites y exclusiones de los Asegurados y a partir de las cuales
podrá efectuarse cualquier reclamación.
Extract 2 (TT):
6. Conclusion
References
ACT. 2005. Primer estudio de mercado de los servicios de traducción profesional en España de la
Asociación de Empresas de Traducción (ACT). Madrid: ACT.
Almahano Güeto, I. 2002. El contrato de viaje combinado en alemán y español: Las condiciones
generales. Un estudio basado en corpus. PhD Thesis. Málaga: Universidad de Málaga.
Aston, G. (ed.). 2001. Learning with Corpora. Bolonia: CLUEB.
Aurioles Martín, A. 2005 [2002]. Introducción al Derecho Turístico (Derecho Privado del Turis-
mo). Madrid: Tecnos.
Aurioles Martín, A., Benavides Velasco, P. G. and González Fernández, M. B. 2004. Contrata-
ción Turística. Technical document BFF2003-04616 MCYT/TI-DT-2004-1. 1–12. <http://
turicor.com/privada/documentos/TI-DT-2004-1.pdf>. [14/03/2007].
Austermühl, F. 2001. Electronic Tools for Translators. Manchester: St. Jerome.
Bernardini, S. and Zanettin, F. (eds). 2000. I corpora nella didattica della traduzione. Corpus Use
and Learning to Translate. Bolonia: CLUEB.
Biber, D., Conrad, S. and Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure
and Use. Cambridge: Cambridge University Press.
Bowker, L. 2002. Computer-Aided Translation Technology: A Practical Introduction. Ottawa:
University of Ottawa Press.
Bowker, L. and Pearson, J. 2002. Working with Specialized Language: A practical guide to using
corpora. London: Routledge.
Braun, E. 2005 [1996]. “El caos ordena la lingüística. La ley de Zipf.” In Caos fractales y cosas
raras, E. Braun (ed.). Mexico D.F.: Fondo de Cultura Económica. <http://omega.ilce.edu.
mx:3000/sites/ciencia/volumen3/ciencia3/150/htm/caos.htm> [14/03/2007].
104 Gloria Corpas Pastor and Miriam Seghiri
Carrasco Jiménez, R. C. 2003. La ley de Zipf en la Biblioteca Miguel de Cervantes. Alicante: Uni-
versidad de Alicante. <http://www.dlsi.ua.es/asignaturas/aa/Zipf.pdf> [14/03/2007].
CORIS/CODIS. 2006. “Progettazione e costruzione di un Corpus di Italiano Scritto.” CO-
RIS/CODIS. Bologna: CILTA. <http://corpus.cilta.unibo.it:8080/coris_itaProgett.html>
[14/03/2007].
Corpas Pastor, G. 2001. “Compilación de un corpus ad hoc para la enseñanza de la traducción
inversa especializada.” Trans: Revista de Traductología 5: 155–184.
Corpas Pastor, G. (ed.) 2003a. Recursos documentales y técnicos para la traducción del discurso
jurídico (español, alemán, inglés, italiano, árabe). Granada: Comares.
Corpas Pastor, G. 2003b. “Diseño de un tipologizador para la traducción jurídica: Del corpus
al prototipo textual.” In Recursos documentales y técnicos para la traducción del discurso
jurídico (español, alemán, inglés, italiano, árabe), G. Corpas Pastor (ed.), 33–58. Granada:
Comares.
Corpas Pastor, G. 2004a. “Localización de recursos y compilación de corpus vía Internet: Apli-
caciones para la didáctica de la traducción médica especializada.” In Manual de documen-
tación y terminología para la traducción especializada, C. Gonzalo García and V. García
Yebra (eds), 223–257. Madrid: Arco/Libros.
Corpas Pastor, G. 2004b. “The Turicor Project: Work in Progress.” Revista Europea de Derecho
de la Navegación Marítima y Arenonáutica xx: 1–14. <http://turicor.com/pdf/corpas2004b.
pdf> [14/03/2007].
Corpas Pastor, G. 2004c. “La traducción de textos médicos especializados a través de recursos
electrónicos y corpus virtuales.” In Las palabras del traductor. Actas del II Congreso Inter-
nacional «El español, lengua de traducción», 20 y 21 de mayo, Toledo 2004, L. González and
P. Hernúñez (eds), 137–164. Brussels: Comisión Europea/ESLETRA. <http://www.turicor.
com/pdf/corpas2004c.pdf> [14/03/2007].
Corpas Pastor, G. and Seghiri, M. 2006a. El concepto de representatividad en la Lingüística del
Corpus: Aproximaciones teóricas y metodológicas. Technical document BFF2003-04616
MCYT/TI-DT-2006-1.
Corpas Pastor, G. and Seghiri, M. 2006b. “Recursos documentales para la traducción de se-
guros turísticos en el par de lenguas inglés-español.” In Investigación y traducción: Una
mirada al presente en la labor investigadora y en el ejercicio de la profesión de la licenciatura
Traducción e Interpretación, E. Postigo Pinazo (ed.). Málaga: Universidad de Málaga.
Corpas Pastor, G. and Seghiri, M. 2007a. “Specialized Corpora for Translators: A Quantitative
Method to Determine Representativeness.” Translation Journal 11 (3). < http://translation-
journal.net/journal/41corpus.htm> [14/03/2007].
Corpas Pastor, G. and Seghiri, M. 2007b. “Determinación del umbral de representatividad de
un corpus mediante el algoritmo N-Cor.” Procesamiento del Lenguaje Natural 39: 165–172.
<http://www.sepln.org/revistaSEPLN/revista/39/20.pdf> [14/03/2007].
Corpas Pastor, G. and Seghiri, M. Forthcoming. El concepto de representatividad en lingüística
de corpus: Aproximaciones teóricas y consecuencias para la traducción. Málaga: Servicio de
Publicaciones de la Universidad.
Council Directive 73/240/EEC of 24 July 1973 abolishing restrictions on freedom of establish-
ment in the business of direct insurance other than life assurance.
Council Directive 76/580/EEC of 29 June 1976 amending Directive 73/239/EEC on the coor-
dination of laws, regulations and administrative provisions relating to the taking up and
pursuit of the business of direct insurance other than life assurance.
Virtual corpora as documentation resources 105
Council Directive 78/473/EEC of 30 May 1978 on the coordination of laws, regulations and
administrative provisions relating to Community co-insurance.
Council Directive 84/641/EEC of 10 December 1984 amending, particularly as regards tourist
assistance, the First Directive (73/239/EEC) on the coordination of laws, regulations and
administrative provisions relating to the taking-up and pursuit of the business of direct
insurance other than life assurance.
Council Directive 87/343/EEC of 22 June 1987 amending, as regards credit insurance and sure-
tyship insurance, First Directive 73/239/EEC on the coordination of laws, regulations and
administrative provisions relating to the taking-up and pursuit of the business of direct
insurance other than life assurance.
Council Directive 87/344/EEC of 22 June 1987 on the coordination of laws, regulations and
administrative provisions relating to legal expenses insurance.
Council Directive 90/618/EEC of 8 November 1990, amending, particularly as regards motor
vehicle liability insurance, first Council Directive 73/239/EEC and second Council Direc-
tive 88/357/EEC on the coordination of laws, regulations and administrative provisions
relating to direct insurance other than life assurance.
Council Directive 92/49/EEC of 18 June 1992 on the coordination of laws, regulations and ad-
ministrative provisions relating to direct insurance other than life assurance and amending
Directives 73/239/EEC and 88/357/EEC (third non-life insurance Directive).
Council Directive 92/96/EEC of 10 November 1992 on the coordination of laws, regulations
and administrative provisions relating to direct life assurance and amending Directives
79/267/EEC and 90/619/EEC (third life assurance Directive).
Directive 2000/26/EC of the European Parliament and of the Council of 16 May 2000 on the
approximation of the laws of the Member States relating to insurance against civil liability
in respect of the use of motor vehicles and amending Council Directives 73/239/EEC and
88/357/EEC.
Directive 2000/64/EC of the European Parliament and of the Council of 7 November 2000
amending Council Directives 85/611/EEC, 92/49/EEC, 92/96/EEC and 93/22/EEC as re-
gards exchange of information with third countries.
Directive 2002/13/EC of the European Parliament and of the Council of 5 March 2002 amend-
ing Council Directive 73/239/EEC as regards the solvency margin requirements for non-
life insurance undertakings.
Directive 2002/92/EC of the European Parliament and of the Council of 9 December 2002 on
insurance mediation.
European Parliament and Council Directive 95/26/EC of 29 June 1995 amending Directives
77/780/EEC and 89/646/EEC in the field of credit institutions, Directives 73/239/EEC and
92/49/EEC in the field of non- life insurance, Directives 79/267/EEC and 92/96/EEC in the
field of life assurance, Directive 93/22/EEC in the field of investment firms and Directive
85/611/EEC in the field of undertakings for collective investment in transferable securities
(Ucits), with a view to reinforcing prudential supervision.
First Council Directive 73/239/EEC of 24 July 1973 on the coordination of laws, regulations
and administrative provisions relating to the taking-up and pursuit of the business of di-
rect insurance other than life assurance.
Fletcher, W. H. 2004. “Facilitating the Compilation and Dissemination of Ad-Hoc Web Cor-
pora.” In The Fith International Conference on Teaching and Language Corpora, G. Aston,
S. Bernardini and D. Stewart (eds), 1–18. Amsterdam: Benjamins. <http://www.kwicfind-
106 Gloria Corpas Pastor and Miriam Seghiri
er.com/Facilitating_Compilation_and_Dissemination_of_Ad-Hoc_Web_Corpora.pdf>
[14/03/2007].
Giouli, V. and Piperidis, S. 2002. Corpora and HLT. Current trends in corpus processing and an-
notation. Bulgaria: Insitute for Language and Speech Processing. <http://www.larflast.bas.
bg/balric/eng_files/corpora1.php> [14/03/2007].
Granger, S. and Petch-Tyson, S. (ed.). 2003. Extending the Scope of Corpus-Based Research: New
Applications, New Challenges. Amsterdam and Atlanta: Rodopi.
Heaps, H. S. 1978. Information Retrieval: Computational and Theoretical Aspects. New York:
Academic Press.
Insurance Act 2000.
Kenny, D. 2001. Lexis and Creativity in Translation. A Corpus-based Study. Manchester: St.
Jerome.
Lavid López, J. 2005. Lenguaje y nuevas tecnologías: nuevas perspectivas, métodos y herramientas
para el lingüista del siglo XXI. Madrid: Cátedra.
Laviosa, S. (ed.). 1998. L’approche basée sur le corpus / The Corpus-based Approach, Meta 43 (4).
Ley 18/1997, de 13 de mayo, de modificaciones del artículo 8 de la Ley de Contrato de Seguro,
para garantizar la plena utilización de todas las lenguas oficiales en la redacción de los
contratos. BOE. 0115 de 14 de mayo de 1997.
Ley 30/1995, de 8 de noviembre, de ordenación y supervisión de los Seguros Privados.
Ley 50/1980, de 8 de octubre, del Contrato de Seguro.
Ley 50/1980, de 8 de octubre, del Contrato de Seguro.
Moreiro González, J. A. 2002. “Aplicaciones al análisis automático del contenido provenientes
de la teoría matemática de la información.” Anales de documentación 5: 273–286. <http://
www.um.es/fccd/anales/ad05/ad0515.pdf> [14/03/2007].
Orden Ministerial de 27 de enero de 1988 por la que se califica la cobertura de las prestaciones
de asistencia en viaje como operación de seguro privado.
Pearson, J. 1998. Terms in Context, Studies in Corpus Linguistics. Amsterdam/Philadelphia:
John Benjamins.
Radev, D., Fan, W., Qi, H., Wu, H. and Grewal, A. 2005. “Probabilistic question answering on
the web.” Journal of the American Society for Information Science and Technology (JASIST)
56 (6): 571–583. <http://filebox.vt.edu/users/wfan/paper/www/www.pdf> [14/03/2007].
Sanahuja, S. and Silva, A. 2001. “Muestreo teórico y estudios del discurso. Una propuesta teóri-
co-metodológica para la generación de categorías significativas en el campo del Análisis
del Discurso.” El Estudio del Discurso: Metodología Multidisciplinaria. II Coloquio Nacional
de Investigadores en Estudios del Discurso. La Plata, 6 al 8 de septiembre de 2001. Buenos
Aires: Asociación Latinoamericana de Estudios del Discurso and Universidad Nacional
del Centro de la Provincia de Buenos Aires. <http://www.sai.com.ar/KUCORIA/discurso.
html> [14/03/2007].
Sánchez-Gijón, P. 2003a. “És la web pública la nova biblioteca del traductor?” Tradumàtica:
Traducció i tecnologies de la informació i la comunicació 2: 1–7. <http://www.bib.uab.es/
pub/tradumatica/15787559n2a7.pdf> [14/03/2007].
Sánchez-Gijón, P. 2003b. Els documents digitals especialitzats: utilització de la lingüística de cor-
pus com a font de recursos per a la traducció. PhD Thesis. Barcelona: Universidad Autóno-
ma de Barcelona.
Sánchez Pérez, A. and Cantos Gómez, P. 1997. “Predictability of Word Forms (Types) and Lem-
mas in Linguistic Corpora. A Case Study Based on the Analysis of the CUMBRE Corpus:
Virtual corpora as documentation resources 107
Pilar Sánchez-Gijón
Universitat Autònoma de Barcelona, Spain
This chapter presents the case for systematic use of do-it-yourself corpora in
specialised translation courses, focusing in particular on the use of corpora as a
documentation resource. An overview is given of the importance of documen-
tation in professional translation, its place in different translation competence
models and the advantages and disadvantages of how it is taught in different
translator training centres. Having reached the conclusion that documentation
skills for translation are best acquired in a translation course as a tool to solve
specific translation problems, the author suggests a protocol to help students
create their own DIY corpus in specialised translation courses. The proposal is
illustrated by examples of problems related to the translation of an instruction
manual for an air conditioning system.
. Contents that are sometimes closely linked to librarianship, like creating a book register or
summarising a text.
112 Pilar Sánchez-Gijón
Subjects included in this category are present in practically all translation curri-
cula in European universities. They are subjects that have to do with the treatment
and solution of specific problems related to translation. They may include subjects
dealing with terminology and language for specific purposes, and subjects dealing
with translation software.
Subjects dedicated to terminology and language for specific purposes usually
evidence a marked orientation towards terminology-oriented methodologies, of
which documentation forms an essential part (Maia 2003), although it may not
necessarily be taught from a translation perspective. Decisions in relation to termi
nology are usually based on original texts in each language without taking into
account the function of the target text or the translation context. Nevertheless,
the processes, techniques and strategies used to acquire documents prior to deci-
sion-making may be considered valid for the documentation process within the
context of translation.
Subjects dealing with translation tools tend to stress the importance of docu-
mentation. With few exceptions, translation tools with documentation functions
usually use the translators’ own output as a documentation resource. Two tools
that are often used by translators and taught to trainees are terminology manage-
ment tools and memory-based computer-assisted translation systems.
. Translation tools is a term that includes both specific tools for translation – in this context
mainly computer-assisted translation (CAT) tools – and tools for resource consultation or de-
velopment (such as corpus tools and terminology management tools).
. The main exponent may be CAT tools based on translation memory.
DIY corpora in the specialised translation course 113
. For instance, Transit Concordance searches, which allow users to search for segments in-
cluding different words that need not be together.
114 Pilar Sánchez-Gijón
Faced with a translation task, translation students learn that professional transla-
tors may look for texts and create their own DIY corpus from the following three
sources:
1. The client.
2. Specialist centres.
3. The Internet, as a means of obtaining information in multiple digital
formats.
In the translation classroom these conditions may also be given. In the first in-
stance, the lecturer may act as the client and provide documentation that students
. These examples were worked on with English to Spanish postgraduate translation and lo-
calisation students of the Autonomous University of Barcelona and the Jaume I University dur-
ing the 2006–2007 academic year.
116 Pilar Sánchez-Gijón
may incorporate into their DIY corpus. By doing this, however, the lecturer is not
helping students acquire or improve their documentation skills. Lecturers may
also provide students with access to specialist sources of documentation, such
as databases, academic articles and institutional centres of documentation, or
encourage them to consult these sources. By going to one of these information
centres, the lecturer can be sure that the information obtained by students is qual-
ity information and homogeneous from the point of view of subject matter, text
format and/or text genre. Unfortunately, resources of this kind are not always
available to our students for every subject domain.
Resorting to the Internet as a means of accessing texts for use in a corpus is
the most viable alternative for documenting any translation task that is being un-
dertaken. In the previous section we mentioned the absence of quality controls
for many of the texts retrieved from the Internet. It may seem that this process of
documentation could also be affected since each of the texts included in the corpus
has as much or as little quality as those used in the process of documentation us-
ing parallel texts. However, by collecting a large number of texts in a DIY corpus,
analysing them together, and observing the different phenomena quantitatively,
we can be sure that any translation decision will be based on a number of different
texts, that is to say, a number of different authors. Thus, even if each of the indi-
vidual texts may not be particularly reliable, the analysis of the sum total of all the
texts will be. In other words, getting, for instance, the same terminological solution
from texts written by different authors ensures a consensus in the use of this solu-
tion. Thanks to the corpus linguistics methodology, every text may be validated by
another text in the DIY corpus, thus ensuring quality in the analysis.
Using the Internet as a means to access texts to build a corpus does, however,
have its problems and these must be taken into consideration. If students are to
build their own corpus, a systematic approach should be taken and three different
stages should be clearly differentiated.
3.1.1 Determining the characteristics of the resource that will provide the texts
A search engine or directory is usually used to carry out searches on the Internet
and to obtain a list of texts that may be included in the DIY corpus constructed by
students. As well as searchers such as Google or Yahoo!, it is worth while consult-
ing specific search engines that only index specific resources. Two types of resource
may be particularly useful for preparing a DIY corpus on a specific subject:
. Varantola (2003, 64) pointed out the unreliability of many Internet texts as a drawback that
may affect the construction of this kind of corpora. It was also pointed out as a shortcoming by
Zanettin (2002, 12): “The relevance and reliability of documents to be included in the corpus
needs to be carefully assessed.”
DIY corpora in the specialised translation course 117
– Using key words or expressions that identify the term within the context of
subject matter (e.g. “aire acondicionado” + instalación).
– Eliminating keywords resulting in heterogeneous subject matter (e.g. “aire
acondicionado” –“bomba de calor”).
– Including words related to the genre or text type that they were most inter-
ested in (e.g. “aire acondicionado” + “manual de instrucciones”11 + véase12).
10. All the following specific searches are expressed according to Google search rules and or-
ders.
11. “Manual de instrucciones” (“User guide” in English): using the text genre as a search key
word. In this case, most of these texts include the genre name as part of their title.
12. A typical expression used in this genre. Expressions like this, which are linked to a genre,
are very useful for delimiting searches and obtaining homogeneous results from the genre point
of view.
118 Pilar Sánchez-Gijón
Downloading texts correctly. According to Samson (2005, 102), one of the four
cross-curricular skills related to computer use (or as he calls it, ‘computer litera-
cy’) in translation is file management.16 In order to use the texts accessed using the
search in a DIY corpus, they must be downloaded so that they can be used locally.
When downloading texts, students may save time, and avoid possible technical
problems, if they are aware of what they want to download and what type of docu-
ments they are dealing with. Given that only textual elements of the retrieved
documents may be included in the DIY corpus, all non-textual elements such as
pictures, sound, etc. need not be downloaded. The time taken to download docu-
ments is therefore minimised, as well as the space they occupy on computers. In
order to help develop students’ skills in computer use, it should be pointed out
that document download may be carried out in two different ways:
13. The search for synonyms in Google may be carried out using the sign ~. For example, a
search for ~physician will lead to results that include this word or words from the same seman-
tic field, such as doctor, medical and hospital.
14. The search for similar words may be carried out using Exlead and the word NEAR, and
may be used to link key words (e.g. “aire acondicionado” NEAR instalación) or key words that
are specific to the style of a genre or text type (e.g. “aire acondicionado” NEAR “para más infor-
mación”).
15. An example of a search by file types: “aire acondicionado” + “manual de instrucciones” ext:
pdf. Only files with these key words will be retrieved, and all of them will be in pdf – the most
usual format for users’ handbooks on the Internet.
16. These skills include: configuration of the user’s workstation, file management, digital text
production (word processing) and basic Internet use. Two of these four skills, file management
and basic Internet use, are worked on and improved through the kind of translation activities
or tasks that are proposed in this paper.
17. Client-based meta-search applications can carry out searches on different search engines
(hence their name). Their output is a single list of results obtained from the different search
engines that can be downloaded.
DIY corpora in the specialised translation course 119
– If the files are in formats other than html or derivatives, they must be convert-
ed to plain text (txt) using a suitable conversion program.18 Such programs
generally convert all files in a directory simultaneously, so this operation is
carried out virtually automatically.
– If the files are in html format or derivatives, most corpus tools can process
them directly. However, given the particularities of the original editing of
these files, it may be necessary to convert some codes into special charac-
ters.19 This may be done using a format converter or even using some corpus
tools (e.g. WordSmith Tools20).
All these operations require computer skills that students should have acquired
during their training, since they are not specific to translator training. However, it
is often necessary to ensure that all the students in the translation class have suffi-
cient instrumental competence to allow them to carry out these tasks successfully.
It is in the translation classroom that students’ computer skills are contextualised
and become part of their translation competence.
Once the operations described in Section 3.1 have been carried out, a DIY corpus
of possibly several tens of thousands of words may be constructed. The source
text used to illustrate the process of documentation described in this article is
an English instruction manual for an air conditioning system manufactured by
Carrier Heating & Cooling (Model OM38–45), published in Indianapolis in 1998.
The manual was to be translated into Spanish and a corpus was built of some
30,000 words in Spanish on the subject of air conditioning. The use of the DIY
corpus in specialised translation will provide us with the factual and/or linguistic
18. The most common conversion applications used in this kind of activity are those that con-
vert pdf files into txt files. There are many different applications, both freeware and shareware,
that may be located through any search engine.
19. This is usually the case when accents are used or special characters that occur in some lan-
guages (e.g. Ñ o Ç). In HTML documents these may appear as a code which is then interpreted
by the browser and reproduced as the appropriate character. For example, an Á may be repre-
sented in the code as AACUTE&.
20. Scott 1996.
120 Pilar Sánchez-Gijón
information necessary for the translator to complete the translation task. It is usu-
ally the corpus constructed in the source text language that provides us with most
factual information since that is the language in which cognitive problems will
occur (Sanchez-Gijón 2005). However, it is the corpus in the target language that
most helps students to identify and solve many of the linguistic problems related
to knowledge shortcomings – they understand the underlying concept of a spe-
cific source language term, but they do not know how to express it in the target
language, or even do not realise that it is indeed a fixed term and not just a casual
expression.21
In cases in which translators are non-native speakers of the source text lan-
guage, specific linguistic problems that occur in the source text will be used as
examples of problems that may be solved using students’ linguistic and transla-
tion intuition with a DIY corpus as the only resource available for validating that
intuition.
21. This is the case of current room temperature, which is reported next.
DIY corpora in the specialised translation course 121
the text genre we are working in, so it is not possible to make any decisions with-
out checking the accepted expression. To do this a search is made using persona*
to obtain practically all possible forms of this root. Table 2 shows all the concor-
dances obtained.
Of all the possible combinations, clearly the most common is lesiones perso-
nales in the plural. Two verbs that accompany lesiones personales have also been
identified. These are evitar – which coincides semantically with avoid, the verb
accompanying personal injury in the original text – and producir.
122 Pilar Sánchez-Gijón
22. Teaching materials designed by Patricia Rodríguez and Pilar Sánchez-Gijón for the UAB
Master in Tradumàtica: Translation and Information Technologies and first used in the class on
“Using electronic corpora” in 2003–2004.
DIY corpora in the specialised translation course 123
The second most common verb form observed is the infinitive ser. The results
of a more detailed analysis of the concordances of ser (Table 4) show that most of
the structures are MODAL VERB (deber, poder, …) + SER + PARTICIPLE. Once
again, this structure is used to give instructions when the agent who is to carry out
the instructions does not appear explicitly.
Table 4. Concordances of ser
ser suficientes para mantener el sistema funcionando
l ventilador exterior deba ser desconecta- resolución de
l ventilador exterior deba ser desconecta- el peor de los
ación de su hogar debe ser inspecciona- unidad.
OLUCIÓN: su hogar debe ser inspeccionado con fre-
TE MANUAL DEBE SER ENTREGADO AL
TE MANUAL DEBE SER ENTREGADO AL
relativa no debe ser menor del 20%
humedad relativano debe ser menor del 20% o mayor
de suspensión debe ser de retícula de Te
ma de suspensión debe ser de retícula de Te condici
con acumulador, y debe ser periódicamente reemplaza
te y que el con- deben ser inspeccionados en este m
ESOLUCIÓN: deben ser inspeccionados en este
talarse en la vista deben ser arreglados para que luzcan
te, el compresor deberá ser volteado con el sello frontal
deshidratadores. Deberá ser examinada y remplazada, si
aceite requerida deberá ser vertida en el acumulador o e
os o abollados deberán ser completamente enderezado
n muchos casos deberán ser reemplazados. Lubricación
ntos corrosivos deberán ser removidos a fin de prevenir
tes fallarán y deberán ser limpio instalado, debería
s importante, debiendo ser inspeccionado antes de la
antenimiento cuando el ser- Los filtros desechables d
medad que haya podido ser absorbido por el aceite PAG
12 o el R134a podrán ser usados para mantener el d
btener servicio. Podría ser necesario limpiar el serpent
ectores de aire podrían ser muy fríos o muy calientes
de tores de aire podrían ser muy fríos o muy calientes pa
cual no tendrá porqué ser drenada. No obstante, si
l de temperatura puede ser un cuadrante, trabajar du
ol de temperatura puede ser un cuadrante, una Al operar
termostato puede ser PROGRAMABLE o
Su termostato puede ser PROGRAMABLE o NO P
ecificaciones pueden ser encontradas en el Temp
o especial para re- ser suficientes para mantener
DIY corpora in the specialised translation course 125
So far students may have discovered that the reader is not usually addressed
directly, but they still do not know how the author addresses the reader when the
address is explicit and direct. If they had a list of lemmatised words they could
analyse the concordances of all the conjugated forms of one verb in particular. As
they do not have such a list (since this DIY corpus is not lemmatised), students
can continue down the list of the most frequently used words until they reach the
first conjugated form of a verb that is not a modal verb. The first such verb on the
list is consulte. Given that it is a regular verb all the concordances of consult* may
be extract in order to obtain all the conjugated forms of the verb.
The results (Table 5) show that the only conjugated form of the verb consultar
used in this corpus is consulte, the more formal form of the second person singu-
lar consulte (Usted). This form is used to address the reader, although the Usted
is elided. The other verb forms in the list confirm this level of formality in ad-
dressing the reader. There are a few examples where the Usted is included. The
analysis of the Spanish corpus has revealed that in this textual genre the authors
address the readers formally, though the most common practice is to use imper-
sonal structures with no direct references to the readers.
4. Conclusions
One of the main conclusions that may be drawn from this paper is that students
must have a minimum level of instrumental/documentary competence not only
126 Pilar Sánchez-Gijón
References
Askehabe, I. 2000. “The Internet for teaching translation”. Perspectives: Studies in Translatology
8, 2: 135–143.
Aston, G. 1999. “Corpus use and learning to translate”. Textus 12, 2: 289–314.
Autermühl, F. 2006. “Training Translators to Localize”. In Translation Technology and its Teach-
ing (with much mention of localization), Pym, A., Perekrestenko, A., Starink, B. (eds). Inter-
cultural Studies Group, Universitat Rovira i Virgili. < http://isg.urv.es/publicity/isg/publi-
cations/technology_2006/index.htm>
Bowker, L. 2002. “Working Together: A Collaborative Approach to DIY Corpora”. In Language
Resources for Translation Work and Research – LREC Workshop #8. <http://mt-archive.
info/LREC-2002-WS-LangResTransl.pdf> 29–32.
Cabré Castellví, M. T. 2001. “Consecuencias metodológicas de la propuesta teórica (I)”. In La
terminología científico-técnica: reconocimiento, análisis y extracción de información formal
y semántica, Cabré Castellví, M. T., Feliu, J. (eds), 19–25. Barcelona: IULA – UPF.
Codina, L. 2002. “Información documental e información digital”. In Manual de Ciencias de la
Documentación, Lopéz-Yepes, J. (ed.), 301–316. Madrid: Pirámide.
Corpas, G. 2001. “Compilación de un corpus ad hoc para la enseñanza de la traducción inversa
especializada”. TRANS. Revista de traductología 5: 155–184.
Faber, P. 2002. “Investigar en Terminología”. In Investigar en Terminología. Interlingua, Faber, P.,
Jiménez Hurtado, C. (eds), 3–23. Granada: Editorial Comares.
Gamero, S. 1998. La traducción de textos técnicos (alemán-español). Géneros y subgéneros. [PhD
Thesis]. Universitat Autònoma de Barcelona.
Hundt, M., Nesselhauf, N., Biewer, C. (eds). 2007. Corpus Linguistics and the Web. Amsterdam/
New York: Rodopi.
DIY corpora in the specialised translation course 127
Electronic corpora and corpus analysis tools are resources that can improve
the way students acquire translation competence. If, as translator trainers, we
wish to develop our students’ competence to solve translation problems, then
we need to provide them with strategies to use existing resources and tools, to
create new ones and to reap the maximum benefit possible from them. We ad-
vocate a type of training that facilitates the development of students’ strategies,
and attempts to evaluate the acquisition of these strategies.
Our methodological approach is based on translation tasks organised around
learning objectives and includes evaluation of the translation process and prod-
uct. This methodology is student-centred, since it allows the student to be the
focus of the learning process, and comprehensive, in that it takes into account
the objectives and all aspects of the learning context in order to develop appro-
priate materials and evaluation.
We suggest that if one of the learning objectives within a translation course
is to grasp how to use corpora, evaluation of this objective should include the
process and not be limited to the overall quality of the product – the translation.
Examples are given of how the use of corpora and corpus-related software can
be evaluated other than by simply examining the final translation. The results of
some of the students’ own evaluations of the methodology are included.
1. Introduction
Many changes have occurred in the translation profession over the last few de-
cades in terms of the quantity of texts that need to be translated in a wide variety
of fields, the speed at which translations are required, the diversity of document
130 Patricia Rodríguez Inés
formats used, etc. We sometimes wonder how translators were able to do their
job before electronic resources and tools were widely available. Every phase in
a translation project can be assisted by a series of computer tools and resources,
ranging from specialised search engines to translation memories or spelling and
grammar checkers which contribute to making the process faster and more ac-
curate and to providing a product of higher quality. As for the aids available to
the translator in the documentation phase, electronic corpora and corpus analysis
tools can improve not only the way the profession is practised, but also the way
translation teachers teach and translation students are trained.
Today, it is more and more necessary for translation students to be trained to
be able to cope with a world that is increasingly demanding in terms of IT skills.
We may, however, wonder how translation teachers can be expected to provide
specialized insight into many fields of knowledge and, for example, cope with the
pace at which science and technology develop. Traditionally, teachers were sup-
posed to have an answer for every question, i.e. they were regarded as information
providers. However, given the requirements of the professional market awaiting
translation students, the optimal role a translation educator can play nowadays is
that of an information facilitator, which involves enriching students’ learning pro-
cesses, helping them wherever necessary and, above all, stimulating the develop-
ment of their operative knowledge, i.e. their know how. Translation trainees need
to develop skills to apply to new situations to solve new problems, and to obtain a
certain degree of expert knowledge in the face of the shortest of deadlines.
Translation courses need to be designed to help students achieve these skills.
The first part of this chapter focuses on the advantages of incorporating learning
corpus use to a task-based translation methodology. Translation teaching with
corpora constitutes a step forward in relation to traditional translation teaching,
since the use of corpora reduces the prominence of the teacher’s intuition in the
classroom and increases the importance of the student, as well as that of the cor-
pus as a documentary resource. The second part of the chapter is concerned with
evaluating this methodology. Examples are given of how the use of corpora and
corpus-related software can be evaluated other than by simply examining the final
translation. The results of some of the students’ own evaluations of the methodol-
ogy are included.
2. Theoretical background
from a methodology that relies almost completely on the teacher’s knowledge and
experience to a methodology that focuses on the students’ needs (for instance,
selection and organisation of materials on the basis of the student’s needs analy-
sis, student self-evaluation, etc.). In reality, however, these changes have not ac-
tually been implemented in many translation courses. Therefore, reform is still
necessary in some centres if translation teaching is to evolve from a methodology
based on the teacher’s intuition to a corpus-based approach that gives the student
greater responsibility. In our teaching proposal, learning how to use corpora to
translate is a learning objective and is linked to translation competence. Compe-
tence is understood as the interaction between knowledge, skills and attitudes for
the purpose of carrying out a task in an appropriate way (see Figure 1).
We are in the middle of the Bologna process, which promotes competence-
based learning within the European Space for Higher Education. These compe-
tences should be applicable to the professional world. Corpus-based work offers a
wide range of possibilities in the translation classroom and can be easily adapted
to competence-based training, focusing on “learning how to learn” and profes-
sional requirements. Work with corpora is based on activities that involve search-
ing and analysing data, and therefore strengthens the sense of learning through
discovery, as well as through reorganising and building upon previous knowledge.
Furthermore, corpus-based learning includes the possibility of working coopera-
tively or in a highly autonomous way.
Translation teaching has a long history. However, teaching professional trans-
lation in European universities began less than 50 years ago. Our proposal for the
use of corpora for translating with students has been inspired by three important
contributions to written translation teaching, made in the last thirty years:
Delisle’s L’analyse du discours comme méthode de traduction (1980) marks the be-
ginning of the incorporation of a well-defined methodology based on learning
objectives into translation teaching. In his 1993 publication, he clearly establishes
the elements that a teaching method needs to take into account:
Une méthode d’enseignement doit clairement délimiter la matière à transmettre,
sérier les difficultés, fixer des objectifs d’apprentissage, préciser les moyens per-
mettant de les atteindre, établir une progresion dans la formation et, en fin,
prévoir des modalités d’évaluation des performances observables.
(Delisle 1993: 15)
or other types of documents. Thirdly, the task-based approach has been success-
fully applied to areas such as language teaching for a long time and has proved to
be very helpful in terms of planning and organising work. Lastly, the task-based
approach stresses the idea of learning through use, i.e. learning through experi-
ence and practice, another fundamental concept of corpus work, given that such
work involves both declarative knowledge (know what) and, most importantly,
operative knowledge (know how, applicable to building and interrogating a cor-
pus, extracting and interpreting relevant data, etc.). In short, learning to translate
and the use of corpora can be combined within a single realistic translation task.
Translation teaching that only draws on the teacher’s experience places com-
plete responsibility for what the students learn on the teacher and his/her intu-
ition about what is right or wrong, common or uncommon. Corpora can provide
alternative sources of authority and have recently been introduced to the world
of translation teaching with a view to providing empirical data and authentic ma-
terial within the classroom. Quantity and quality have also been enhanced, the
former by virtue of there being more texts to consult during the documentation
phase and the latter in the form of translations that, more than ever, resemble
original texts in the target language, avoiding “translationese”. However, despite
the great advantages of using electronic corpora for translation teaching, the ex-
amples included in the literature tend to be rather anecdotal and a well-founded
methodology remains to be established. Corpus linguistics does have its own
methodology, which is based on a process of observation, analysis and generalisa-
tion, but a methodology for translation teaching with corpora has not yet been
explored in depth. We have started to work in this field and are now testing some
materials with students. These materials follow a task-based approach that uses a
methodology organised around systematically, coherently and comprehensively
designed and structured learning objectives, tasks and teaching units.
For those who are not familiar with the task-based approach, tasks are the ba-
sic organisational units of the learning process and they make up larger structures
called teaching units. A learning objective is, in simple terms, what we as teachers
expect our students to achieve, either as regards a single task or a whole teach-
ing unit (Delisle 1993, 1998). Several items need to be taken into consideration
to build a teaching unit, such as the learning context and students’ level of pro-
ficiency. It is vital to assign learning objectives to the unit and to organise it into
tasks. Each task should have its own learning objective(s), a detailed explanation
of what the student has to do, a list of the materials required to carry out the task,
and a description of the evaluation to be applied. In other words, the task-based
approach allows the teacher to organise every aspect of the teaching situation that
he/she has to take into account (what to teach, how to teach, when to teach and
how to evaluate the results of the learning process).
134 Patricia Rodríguez Inés
We have designed a full proposal for the use of electronic corpora in translation
teaching for different levels of proficiency in translation and corpus use. This full
proposal, which takes students’ previous knowledge and their learning context
into account, includes learning objectives related to a competence, teaching units
and a proposal for the evaluation of the use of corpora (Rodríguez Inés 2008).
However, due to space limitations, our proposal is only sketched out here.
dent is expected to acquire, they do express themselves in other terms. For ex-
ample, Aston says that “to use corpora effectively in increasingly independent
research, learners need technical, methodological, and conceptual knowledge and
abilities” (Aston 2001: 24) and Zanettin comments that “... translators need to be
able to see patterns and regularities both within a language and across languages”
(Zanettin 1994: 109), while Varantola (2003: 69) lists the competences required to
be able to use corpora in translation, making a distinction between two groups
of categories.
Corpus compilation
• Corpus design and design criteria.
• Search strategies and search word selection.
• Source criticism to assess the reliability of corpus texts.
• Assessment of corpus adequacy and relevancy.
• Software literacy in general.
• Selection of Internet search engines.
• Integrated use of word processing tools and corpus tools.
Use of corpus information
• Deductive corpus analysis skills in general.
• Use of preliminary corpus information for more targeted compilation criteria.
• Use of corpus evidence for translational decisions.
• Corpus evaluation and decision-making skills.
• Distinctions between permanent corpus collections and targeted, disposable corpora.
• Overall corpus knowledge management skills.
Curriculum design for university degrees should take into account a number
of issues, ranging from formal considerations, such as the number and distribu-
tion of credits required to obtain a degree, to contextual issues, such as students’
profiles and previous training, market expectations, etc. As mentioned previously,
specific competences should be established for degrees in translation and inter-
preting, so that each educational centre interested in following a competence-
based approach would have a catalogue or a list of competences specific to its
discipline and on the basis of which it could work. Bearing all this in mind, we
would like to suggest one specific instrumental sub-competence, namely the abil-
ity to use electronic corpora adequately in order to solve translation problems in an
appropriate manner.
This specific sub-competence is composed of four elements:
We advocate the use of corpora in translation classes at an early stage of the learn-
ing process, and have established two phases in the pedagogical progression in-
volved in using corpora to translate, namely an introductory phase and a consoli-
dation phase. In the introductory phase, students acquire basic methodological
and technical principles where corpus work is concerned, while the consolida-
tion phase is geared to enabling them to reap the maximum benefit possible from
corpora in translation. The four elements mentioned above are then distributed
between these two phases in the form of general and specific learning objectives,
i.e. statements that describe the results expected after a learning process has taken
place (with objectives of the former variety being less observable than those of
the latter).
The overall aim of the introductory phase is for the student to develop an
understanding of certain basic methodological principles related to the use of
corpora for translation purposes, and to be able to work with corpora at a basic
level. The following three general learning objectives have been defined for the
introductory phase:
rinciples related to corpus work necessarily constitutes the first stage, it is also
p
true that the student does not need to assimilate all those principles to be able to
start using corpora in translation. Meanwhile, the student can become familiar
with corpus-related software as part of the process of acquiring some of the afore-
mentioned principles.
The overall aim of the consolidation phase in the use of corpora to translate is
for the student to acquire more advanced knowledge and more specialised abili-
ties with regard to the use of corpora, so as to make the most of such resources
when translating. The following three general learning objectives have been de-
fined for the consolidation phase:
1. To build corpora.
2. To use advanced functions of corpus-related software.
3. To use corpora at an advanced level to solve translation problems.
We have designed several teaching units with the aim of covering most of the
proposed learning objectives. The contents related to these learning objectives
are varied and include the compilation, use and evaluation of different types of
corpora; the use of basic and advanced functions of corpus analysis software; and
the use of corpora to solve translation problems of different kinds, with vary-
ing levels of difficulty and corresponding to a range of fields of specialisation.
A wide range of resources and materials have been used, including dictionaries,
glossaries, thesauri, comparable and parallel corpora, texts of different natures
and genres, and software such as corpus analysers, text aligners, web download-
ers and off-line browsers. Task activities are also varied, with students working
individually, in pairs and in groups; likewise, evaluation is either through self-
assessment, peer-assessment or undertaken by the teacher. Relevance, variety
and progression in terms of difficulty levels have been the main criteria on which
decisions have been based regarding the design of teaching units, the selection of
materials, how to use these materials and how to evaluate the student’s learning
process.
An example of these units is “Ingredients for my corpus: quality texts”.
138 Patricia Rodríguez Inés
teaching unit
“Ingredients for my corpus: quality texts”
difficulty level
Consolidation
subject
Specialised translation (Spanish-English)
learning objectives
– To build corpora
– To develop a critical attitude towards parallel texts
– To build an ad hoc bilingual comparable corpus
– To use advanced functions of corpus-related software
– To extract concordances: simple, using truncated searches, using context words
(refresher)
– To find the most appropriate sorting method within a concordance set (refresher)
– To re-sort the context (refresher)
– To download texts from the Internet
– To extract and interpret collocations
– To extract and interpret clusters
– To use corpora at an advanced level to solve translation problems
– To select the appropriate corpus/corpora according to translation needs (refresher)
– To use co-textual information from concordances
structure
– Task 1: Identifying potential translation problems in my source text
– Task 2: Analysis of documentation needs
– Task 3: Internet text quality evaluation
– Task 4: Exploring the possibilities of the resource I have created
– Final task: Using an ad hoc corpus to solve translation problems
As can be seen, the teaching unit selected belongs to the consolidation phase
and focuses on learning how to build and use corpora. In this case, it also includes
re-examining learning objectives already dealt with in a previous teaching unit
corresponding to the introductory phase, with a view to emphasising their im-
portance. The unit was organised into five tasks. The student’s use of corpora was
evaluated in the final task, i.e. that in which the ability to achieve several objec-
tives was tested.
3.3 Evaluation
We will now focus on evaluation and the changes entailed by the use of technical
resources in teaching, as regards what is evaluated and how. To that end, we will
describe the experience we have gained from teaching translation with corpora
and evaluating students’ performance.
The learning process in the use of corpora in translator education 139
In our concept of evaluation, it is not only the translation product that should
be assessed, but also the process in which the students use the resources (corpora)
available. This notion is essential in formative evaluation, which is a source of
information that can help improve translation teaching and learning. In other
words, the purpose of evaluation is not just to assign or receive marks, but for
teachers and students to learn about the learning process and progress. Where
translation is concerned, both the product and the process can be evaluated; it is
simply a case of using different instruments for each aspect. The instruments (us-
ing the term in a broad sense) that we developed to evaluate students’ learning of
the use of corpora throughout the semester were the following:
– An observation chart for the teacher to record the students’ observable prog-
ress.
– A learning diary for the students to keep a record of what they feel they have
learned.
– A self-evaluation questionnaire for the students to assess their own learning
at the end of every teaching unit.
– A questionnaire for the students to comment on the contents and methodo
logy used at the end of every teaching unit.
Other instruments were designed to test the way corpora are used by students
when translating. These instruments were created to be used in the teaching units
in which, by way of a final task, students are asked to translate a text using corpora
only. The instruments in question are the following:
This paper will subsequently focus on these four instruments and how they were
used. Before doing so, however, we will describe our procedure for preparing the
evaluation of the use of corpora to translate:
3. A percentage of the overall mark was assigned to each item on the basis of the
level of proficiency expected of the students.
4. The source text to be translated was selected.
5. A limited number of translation problems that could be solved by using cor-
pora were selected from the source text, and the questionnaire was prepared
accordingly.
6. The students were provided with the source text, a predetermined set of cor-
pora and the questionnaire on preselected translation problems.
Table 1. Example of correlation between a learning objective and the item to be assessed
Learning objective Item to be assessed
To identify the importance of looking on both sides of a keyword or Appropriateness of the
term in order to extract conceptual or collocational information sorting of results
In the teaching unit being used here as an example, the items selected were the
following:
3.3.4 Questionnaire
A questionnaire was designed to test the way corpora are used by students at cer-
tain points of the education process in order to evaluate their acquisition of com-
petences and progress in using corpora for translation purposes. A questionnaire
Translate the following text into English. This text is part of a website (http://www.dermovet.com) that offers information and services on veterinary
dermatology to a non-expert or semi-expert readership. The translated text will have the same type of target readers and function as the source text.
Terminología Dermatológica Veterinaria / Centros Veterinarios / Pruebas diagnósticas / Novedades Terapéuticas / Visita Dermatológica
Mascotas...
142 Patricia Rodríguez Inés
Mácula: Lesión caracterizada por un cambio de color de la piel, sin elevación ni engrosamiento de la misma. Son focales, bien circunscritas y
de un tamaño inferior a 1 cm. Existen varios tipos: eritematosa (atopía), hiperpigmentada (lentigo), hipopigmentada (vitíligo), hemorrágica
(intoxicación, reacción a fármaco).
¿Qué le sucede a mi perro? Mancha: Lesión idéntica a la mácula pero de un diámetro mayor a 1 cm, y suelen ser menos bien delimitadas.
Pápula: Área cutánea con relieve, sólida y circunscrita de hasta 1 cm de diámetro. Puede ser folicular o interfolicular. Son indicativas de pioderma
y/o parasitosis, en la mayoría de casos.
Eritema: Enrojecimiento de la piel debido a la vasodilatación de los vasos dérmicos superficiales. Indica inflamación cutánea.
Placa: Lesión aplanada mayor de 1 cm de diámetro, la mayoría de veces debido a la unión de varias pápulas. Son indicativas de pioderma y menos
frecuentemente de enfermedades autoinmunes.
Pústula: Elevaciones bien delimitadas de los estratos superficiales de la epidermis, normalmente con contenido purulento. Hay de varios tipos:
sépticas (piodermas), estériles (Pénfigo Foliáceo), eosinofílicas (alergias, parasitosis), linfocíticas (Linfoma Epiteliotrópico). Pueden ser foliculares
o interfoliculares.
Vesícula: Lesión similar a la pústula pero con contenido seroso o con exudado inflamatorio y de diámetro inferior a 1 cm de diámetro. Se origina
por un acúmulo de fluido en los espacios intercelulares que aparece y desaparece en minutos u horas.
Bulla: Vesícula mayor de 1 cm de diámetro.
Habón: Lesión con relieve consistente en un edema intercelular de las células de la epidermis. Típica de las reacciones de hipersensibilidad tipo I.
¿Qué le sucede a mi gato? Es la lesión principal en reacciones de urticaria y en las reacciones positivas al Skin-Test.
Nódulo: Lesión elevada mayor de 1 cm de diámetro, bien delimitada y sólida. Suelen estar bien infiltradas en la dermis. Típico de neoplasias y
Paniculitis Nodular Estéril.
Tumor: Masas neoplásicas tanto benignas como malignas. Se utiliza cuando hay nódulos muy grandes.
Quiste: Cavidades forradas por epitelio localizadas en el interior de la piel y normalmente con contenido glandular.
Table 2. Questionnaire to be filled in by the student with regard to his/her use
of corpora to solve a translation problem
problem 3 (Terminology): “habón”
question answer
What word or string of words or characters have you searched for? In search 1: corpus:
which corpus/corpora have you searched? search 2: corpus:
search 3: corpus:
Have you restricted your search in any way (regional variant, oral or
written mode, date, etc.)?
Have you re-sorted the concordances alphabetically?
Have you re-sorted the concordances alphabetically to the right or to
the left of the keyword? Specify the sorting criterion used (e.g. L1, R1).
State whether you have used other functions from WordSmith Tools
(Grow, Shrink, Clusters, Collocates, etc.).
Write down the various translation solutions you had considered before
making your final choice.
Justify your solution.
solution suggested by the student: “your solution”
Table 3. Example of correlation between elements within the evaluation proposal
contained herein
Learning objective Item to be Question/s
assessed (from questionnaire)
To identify the importance Appropriateness of – Have you re-sorted the concordances
of looking on both sides of the sorting of results alphabetically?
a keyword or term in order – Have you re-sorted the concordances
to extract conceptual or col- alphabetically to the right or to the
locational information left of the keyword? Specify the
sorting criterion used (e.g. L1, R1)
real problems the students needed to solve with the help of corpora, doing so
would have constituted a completely different study requiring a different perspec-
tive. In our case, we were interested in seeing how the students applied what they
had learned to using the corpora available (if they used them at all) for the pur-
pose of solving various problems (the same problems for every student).
Table 4. Chart for combining the data extracted from each student’s translation
and questionnaire
Student A Problem Problem Problem Problem Problem PARTIAL FINAL MARK
1 2 3 4 5 MARK (out of 10 after
(after appli- application
cation of %) of %)
Item 1 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 2 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 3 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 4 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 5 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Acceptability √/X √/X √/X √/X √/X
Table 5. Example of a completed chart, combining data from a student’s translation
results and questionnaire
Key:
Problem 1: ¿Qué le sucede a mi...?
Problem 2: Son indicativas de pioderma... / Es indicativo de Pénfigo Foliáceo...
Problem 3: Habón
Problem 4: Tumor: masas neoplásicas tanto benignas como malignas. Se utiliza cuando hay
nódulos muy grandes.
Problem 5: estrato córneo / basal
Item 1: Appropriateness of the corpus/corpora selected
Item 2: Appropriateness of the search string entered
Item 3: Appropriateness of the search restrictions applied
Item 4: Appropriateness of the sorting of results
Item 5: Appropriateness of the use of available software functions
0: Incorrect
1: Improvable
2: Correct
Acceptability (of the equivalent proposed)
√: Right
X: Wrong
146 Patricia Rodríguez Inés
Table 5 (continued)
Student A Problem Problem Problem Problem Problem PARTIAL FINAL MARK
1 2 3 4 5 MARK (out of 10 after
application of %)
Item 1 2 2 2 2 8/8 = 1 9.2
(value 20%)
Item 2 2 2 2 2 8/8 = 1
(value 20%)
Item 3 2 0 2 4/6 = 0.6
(value 20%)
Item 4 2 2 2 6/6 = 1
(value 20%)
Item 5 2 2 2 2 6/6 = 1
(value 20%)
Acceptability √ √ √ X √
4. Results
The teaching unit presented here was tested on a group of 26 final-year Spanish
students taking a course in specialised translation into English. At the end of this
The learning process in the use of corpora in translator education 147
unit, students were asked to fill in 2 questionnaires. The aim of the first was for
them to rate, on a scale of 1 to 10, their acquisition of the competences to which
the unit was geared and the benefits of using corpora to translate. The purpose of
the second was to collect their opinions and comments on their level of satisfac-
tion with the contents of the unit and the methodology used. The results from the
first questionnaire showed that most students were confident that they had ac-
quired the competences to which the unit was geared. For example, the students’
average rating for premises such as “I am able to use a monolingual comparable
corpus in order to translate a text” and “Using corpora has helped me to feel
more confident about my translation solutions” were 7.9 and 8.3 respectively. The
students’ evaluation of the unit’s contents and methodology was also positive, for
example, they stressed that using corpora enabled them to save time when trans-
lating. While collecting quality texts to build a corpus may take some time, the
resulting resource helps to solve many of the translation problems that arise. Fur-
thermore, students said that learning how to use the program WordSmith Tools
helped them a great deal in terms of reaping the full benefits of parallel texts (i.e.
finding terms using cotextual information, appropriate use of these terms, use of
authentic English syntax, etc.).
5. Conclusions
This paper has asserted the need for evolution and the adoption of new method-
ologies in translation teaching. As stated, working with corpora brings authentic
material and empirical data to language and translation research and teaching.
More than ever, the use of corpora has made it possible to focus on the student,
the translation task and the resources used, rather than on the teacher. Further-
more, corpora (as resources) and corpus linguistics (as a methodology and a new
way of approaching language work) promote a sense of discovery that increases
motivation and student autonomy, in addition to encouraging the use of IT tools
and the processing of information in electronic format.
Looking ahead, a task-based approach that revolves around learning ob-
jectives linked to competences can provide a methodological framework for
teaching translation with corpora, i.e. a teaching method that is systematic and
comprehensive in that it allows for the integration of all the elements involved in
education, making the processes of teaching and learning more coherent. Learn-
ing corpus use to translate is not just about teaching/learning how to use tools,
but also about following a methodology that makes the process more systematic.
148 Patricia Rodríguez Inés
As stated previously in this paper, if using tools is a learning objective, this learn-
ing should be evaluated.
Despite the fact that the type of evaluation suggested here needs to be per-
fected, it is also true that, with certain limitations, it can provide information about
the process of using a corpus and the origin of a student’s translation errors, as well
as data that can be used to improve teaching/learning, given that it helps to reveal
where a learning objective has yet to be fully achieved. If this is the case, the teacher
can then go back and modify the task corresponding to the learning objective in
question, or even design new tasks to make sure that the objective is fulfilled.
References
Aston, G. 2001. “Learning with corpora: An overview”. In Learning with corpora, G. Aston
(ed.), 7–45. Bologna: CLUEB.
Beeby, A. 1996 Teaching Translation from Spanish to English. Ottawa: University of Ottawa
Press.
Bowker, L. 1999. “Using a corpus to assess student translations: A pilot study”. In PALC’99:
Practical Applications in Language Corpora. Papers from the International Conference at
the University of Lódz, 15–18 April 1999, B. Lewandowska-Tomaszczyk and P. James Melia
(eds), 529–540. Bern: Peter Lang.
Delisle, J. 1980. L’analyse du discours comme méthode de traduction. Cahiers de Traductologie
2. Université d’Ottawa.
Delisle, J. 1993. La traduction raisonnée. Manuel d’initiation à la traduction professionnelle
de l’anglais vers le français. Col. Pédagogie de la traduction. Les Presses de l’Université
d’Ottawa.
Delisle, J. 1998. “Définition, rédaction et utilité des objectifs d’apprentissage en enseignement
de la traduction”. In Los estudios de traducción: un reto didáctico, I. García Izquierdo and
J. Verdegal (eds). Col. Estudis sobre la traducció 5. Universitat Jaume I.
González Davies, M. 2003. (coord.). Secuencias. Tareas para el aprendizaje interactivo de la
traducción especializada. Barcelona: Octaedro.
Hurtado, A. 1992. “Didactique de la traduction des textes spécialisés”. In Actes de la 3ème
Journée ERLA-GLAT. Lexique spécialisé et didactique des langues 9–21. Brest: UBO-ENST.
Hurtado, A. 1999. Enseñar a traducir. Madrid: Edelsa.
Hurtado, A. 2007. “Competence-based curriculum design for training translators”. In The Inter-
preter and Translator Trainer (ITT). Vol. 1(2): 163–195.
Kiraly, D. 2000. A social constructivist approach to translator education. Manchester: St. Je-
rome.
PACTE. 2003. “Building a Translation Competence Model”. In Triangulating Translation: Per-
spectives in process oriented research, F. Alves (ed.), 43–66. Amsterdam: John Benjamins.
Rodríguez Inés, P. 2008. Uso de corpus electrónicos en la formación de traductores (inglés-espa-
ñol-inglés). PhD thesis. Departament de Traducció i d’Interpretació. Universitat Autònoma
de Barcelona.
The learning process in the use of corpora in translator education 149