Beeby Corpus Use and Translating Corpus Use For Learning To Translate and Learning Corpus Use To Translate

Corpus Use and Translating
Benjamins Translation Library (BTL)

The BTL aims to stimulate research and training in translation and interpreting
studies. The Library provides a forum for a variety of approaches (which may
sometimes be conflicting) in a socio-cultural, historical, theoretical, applied and
pedagogical context. The Library includes scholarly works, reference books, post-
graduate text books and readers in the English language.
EST Subseries
The European Society for Translation Studies (EST) Subseries is a publication
channel within the Library to optimize EST’s function as a forum for the
translation and interpreting research community. It promotes new trends in
research, gives more visibility to young scholars’ work, publicizes new research
methods, makes available documents from EST, and reissues classical works in
translation studies which do not exist in English or which are now out of print.
General Editor Associate Editor Honorary Editor

Yves Gambier Miriam Shlesinger Gideon Toury
University of Turku Bar-Ilan University Israel Tel Aviv University
Advisory Board
Rosemary Arrojo Zuzana Jettmarová Rosa Rabadán
Binghamton University Charles University of Prague University of León
Michael Cronin Werner Koller Sherry Simon
Dublin City University Bergen University Concordia University
Daniel Gile Alet Kruger Mary Snell-Hornby
Université Paris 3 - Sorbonne UNISA, South Africa University of Vienna
Nouvelle
José Lambert Sonja Tirkkonen-Condit
Ulrich Heid Catholic University of Leuven University of Joensuu
University of Stuttgart
John Milton Maria Tymoczko
Amparo Hurtado Albir University of São Paulo University of Massachusetts
Universitat Autònoma de Amherst
Franz Pöchhacker
Barcelona
University of Vienna Lawrence Venuti
W. John Hutchins Anthony Pym Temple University
University of East Anglia
Universitat Rovira i Virgili
Volume 82
Corpus Use and Translating. Corpus use for learning to translate
and learning corpus use to translate
Edited by Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón
Corpus Use and Translating
Corpus use for learning to translate and learning
corpus use to translate
Edited by
Allison Beeby
Patricia Rodríguez Inés
Pilar Sánchez-Gijón
Universitat Autònoma de Barcelona
John Benjamins Publishing Company

Amsterdam / Philadelphia
TM
The paper used in this publication meets the minimum requirements of
8
American National Standard for Information Sciences – Permanence of

Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data
Corpus use and translating : corpus use for learning to translate and learning corpus
use to translate / edited by Allison Beeby, Patricia Rodríguez Inés and Pilar
Sánchez-Gijón.
p. cm. (Benjamins Translation Library, issn 0929-7316 ; v. 82)
Includes bibliographical references and index.
1. Translating and interpreting--Data processing. 2. Corpora (Linguistics) 3.
Translators--Training of. I. Beeby Lonsdale, Allison. II. Rodríguez Inés,
Patricia. III. Sánchez-Gijón, Pilar.
P309.C67 2009
440--dc21 2008041947
isbn 978 90 272 2426 2 (Hb; alk. paper)
isbn 978 90 272 9106 6 (eb)
© 2009 – John Benjamins B.V.

No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any
other means, without written permission from the publisher.
John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands
John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
Table of contents
List of editors and contributors vii

Foreword ix
Guy Aston
Introduction 1
Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón
Using corpora and retrieval software as a source of materials

for the translation classroom 9
Josep Marco and Heike van Lawick
Safeguarding the lexicogrammatical environment:

Translating semantic prosody 29
Dominic Stewart
Are translations longer than source texts? A corpus-based study

of explicitation 47
Ana Frankenberg-Garcia
Arriving at equivalence: Making a case for comparable general

reference corpora in translation studies 59
Gill Philip
Virtual corpora as documentation resources: Translating travel

insurance documents (English-Spanish) 75
Gloria Corpas Pastor and Miriam Seghiri
Developing documentation skills to build do-it-yourself corpora

in the specialised translation course 109
vi Corpus Use and Translating
Evaluating the process and not just the product when using corpora
in translator education 129
Subject index 151

List of editors and contributors
Editors: Contributors:
Allison Beeby Patricia Rodríguez Inés

Universitat Autònoma de Barcelona, Spain Universitat Autònoma de Barcelona, Spain
Departament de Traducció i d’Interpretació Departament de Traducció i d’Interpretació
Edifici K Edifici K
08193 Bellaterra 08193 Bellaterra
Barcelona Barcelona
Spain Spain
allison.beeby@uab.es patricia.rodriguez@uab.es
Patricia Rodríguez Inés Pilar Sánchez-Gijón
Universitat Autònoma de Barcelona, Spain Universitat Autònoma de Barcelona, Spain
Departament de Traducció i d’Interpretació Departament de Traducció i d’Interpretació
Edifici K Edifici K
08193 Bellaterra 08193 Bellaterra
Barcelona Barcelona
Spain Spain
patricia.rodriguez@uab.es, pilar.sanchez.gijon@uab.es
Pilar Sánchez-Gijón Gloria Corpas Pastor
Universitat Autònoma de Barcelona, Spain Universidad de Málaga, Spain
Departament de Traducció i d’Interpretació Departamento de Traducción
Edifici K e Interpretación
08193 Bellaterra Avda. Cervantes, 2
Barcelona 29071 Málaga
Spain Spain
pilar.sanchez.gijon@uab.es gcorpas@uma.es
Míriam Seghiri Domínguez
Universidad de Málaga, Spain
Departamento de Traducción
e Interpretación
Avda. Cervantes, 2
29071 Málaga
Spain
seghiri@uma.es
viii Corpus Use and Translating
Ana Frankenberg-Garcia Gill Philip

Instituto Superior de Línguas Università degli Studi di Bologna
e Administração, Lisboa CILTA – Centro Interfacoltà di Linguistica
Rua Professor Dias Valente 168–8º Dtº Teorica ed Applicata “Luigi Heilmann”
2765–578 Estoril Piazza San Giovanni in Monte, 4
Portugal 40124 Bologna
ana.frankenberg@sapo.pt, ana_frankenberg@ Italy
hotmail.com g.philip.polidoro@gmail.com
Josep Marco Dominic Stewart
Universitat Jaume I, Spain Università di Macerata
Departament de Traducció i Comunicació Facoltà di Lettere e Filosofia
Campus de Riu Sec Palazzo Ugolini
12071 Castelló de la Plana Via Morbiducci 40–62100 Macerata
Spain Italy
jmarco@trad.uji.es dominic.stewart@unimc.it
Heike van Lawick
Universitat Jaume I
Departament de Traducció i Comunicació
Campus de Riu Sec
12071 Castelló de la Plana
Spain
lawick@trad.uji.es
Foreword
Guy Aston
University of Bologna at Forlì, Italy
The Corpus Use and Learning to Translate workshops were born out of two be-
liefs. First, that language corpora, if selected and used appropriately, are able to
provide more abundant and reliable information to the translator than traditional
reference tools, such as dictionaries and “parallel texts”. Second, as previous work
in foreign language teaching had suggested, that corpora are able to offer learn-
ing environments which empower learners and increase their autonomy, allowing
them to develop their knowledge and awareness while at the same time providing
them with a range of opportunities for using the language – amongst which, for
engaging in translation.
Much of the discussion within CULT has focussed on the potential of different
types of corpora – from large established monolingual mixed reference corpora
to small do-it-yourself specialised monolingual or comparable ones, from paral-
lel corpora of original texts and their official translations to corpora of learner
translations. A lot of work has gone into developing better ways of constructing
appropriate corpora, and better tools to interrogate them within a “translator’s
workbench”. At the same time, there has been continual discussion of how we can
develop translators’ ability to exploit corpora effectively.
The difficulty of using corpora is that they rarely provide immediate answers
to a translator’s problems. Unlike translation memory or machine translation sys-
tems, they do not instantly present a preferred candidate for the user to accept,
modify or reject. Corpus data has to be interpreted and evaluated comparatively
to reach conclusions, and this requires not only technical skill (perhaps the least
of the problems, since learners’ computational competence is often greater than
their teachers’), but above all critical thought. Training would-be translators to
use corpora goes hand in hand with educating them to think about the translation
process and the learning process, developing their sensitivity as to how they can
use corpora in these processes.
It is difficult to deny that corpus use is anti-economic in the short term, and
this is probably why, while increasingly taught in translation schools, it has not
Guy Aston
yet become widely established among professional translators. Regardless of its

potential to improve translation quality and to provide a fruitful learning envi-
ronment, corpus consultation remains time-consuming, and corpus construction
enormously more so. One part of the problem is whether and how we can im-
prove the efficiency of corpus use for the translator, facilitating both consultation
and construction, and do so without compromising its quality as a translating and
learning tool. A second part of the problem, however, concerns attitude. Not all
translators, be they learners or professionals, appreciate that corpus use may have
a medium- and long-term payoff which can override what they often perceive as
short-term disadvantages. Following on from the papers from the first two work-
shops (Bernardini & Zanettin 2000; Zanettin et al. 2003), this third CULT volume
offers further contributions for a debate which is far from being concluded.
References
Bernardini, S. & F. Zanettin (eds.) (2000). I corpora nella didattica della traduzione – Corpus Use
and Learning to Translate. Bologna: Cooperativa Libraria Universitaria Editrice.
Zanettin, F., S. Bernardini & D. Stewart (eds.) (2003). Corpora in Translator Education. Man-
chester: St Jerome.
Introduction

Universitat Autònoma de Barcelona, Spain
Corpus Use and Translating is mainly addressed to those interested in translation

training and focuses on ways of getting the best out of electronic corpora. The
first part, Corpus use for learning to translate will give ideas to teachers who want
to prepare learning materials and tasks using corpora. The second part, Learning
corpus use to translate is about helping students to become autonomous users of
corpora as part of their translation competence. In the past, students learnt to
translate without using electronic corpora and obviously many will continue to
do so, but there are significant advantages to be gained from learning corpus use
to translate. Professional translators are always under pressure to meet deadlines
and take short cuts. Only during their training at university do they have the time
to learn strategies and methodologies that will help to improve the quality and
quantity of their production and learning corpus use is one of these.
Our book is a continuation of the CULT (corpus use and learning to trans-
late) tradition. This is a relatively recent tradition as it was only at the end of the
20th century that computers powerful enough to cope with large electronic cor-
pora were available to ordinary translators, translator trainers and trainees. Mona
Baker (1993), who had worked with John Sinclair on the use of corpus in lexi-
cography, was a pioneer in suggesting the implications and applications of corpus
linguistics for translation studies. Guy Aston (1999), who had been working with
corpus and language acquisition, provided the idea for the CULT conferences.
The first two were held in 1997 and 2000, in Bertinoro, Italy and organised by
Silvia Bernardini, Federico Zanettin, and Dominic Stewart from the University
of Bologna at Forlì. When Patricia Rodríguez Inés was in Forlì in 2002, she was
persuaded to carry the flame back to Barcelona where the third CULT conference
. A selection of papers from the 1997 conference were included in Bernardini and Zanettin
(2000).
. Some of the work presented at the CULT2K conference provided the kernel for Zanettin,
Bernardini and Stewart (2003).
was held in 2004. Most, but not all, of the contributions to this volume devel-
oped out of the Barcelona conference (CULT BCN), which may explain the book’s
Spanish flavour. Further background to the disciplines involved in CULT can be
found in different chapters of this book: corpus linguistics (Corpas Pastor and
Seghiri Domínguez), corpus-based translation studies (Stewart, Frankenberg-
Garcia and Philip), corpora in language teaching and translator training (Marco
and Van Lawick, Sánchez-Gijón and Rodríguez Inés).
Many of the issues addressed in this book are related to questions that were
raised in CULT2K. What is the role of corpora in documentation for translators?
Why bother with corpora when we have the Internet? If we use ad hoc or dispos-
able corpora, how can we be sure they are reliable or representative? Is the time
needed to learn how to build and use corpus worth the effort? As Silvia Bernardini
said in Barcelona in 2004, we are still looking for a balance between training and
education, between the claims of Gouadec (2002/2007), “no serious translator
training programme can be dreamt of unless the training environment emulates
the work station of professional translators” and the reminder of Mossop (1998),
“if you can’t translate with pencil and paper, then you can’t translate with the lat-
est information technology.” In fact, translation faculties have to find this balance
between the positions of Gouadec and Mossop if their graduates are to survive in
the real world of the professional translator in the 21st century.
Translation has been an object of research in Artificial Intelligence and the
computer sciences and attempts have been made to make the translation process
partially or totally automatic. Fully automatic translation programmes remain
a chimera and researchers have turned to less ambitious but more productive
projects. Some of these have revolutionized both the way translators work to-
day (for example, computer assisted translation programmes) and the way they
solve problems (for example, terminology data bases). Corpus linguistics has also
been added to the technology-based battery of resources at the translator’s dis-
posal. However, in the case of corpus linguistics, the technology is accompanied
by a methodology as well as a number of free access corpora, or the possibility of
building a corpus with relative ease. Corpus linguistics tools allow translators to
approach texts, their own and those of others, and analyse them both quantita-
tively and qualitatively.
Translator trainers have been using these tools in the classroom for over a
decade, both in general and specialised translation and in both directions, B-A/
A-B (translating into/out of the translator’s language of habitual use). Corpora
have proved to be very useful when trainee translators are working into a foreign
language and have to compensate for insecurities in the target language and cul-
ture. Several of the contributors to this book have used corpora to teach transla-
tion into a foreign language (Stewart 2001; Corpas 2001; Rodríguez Inés 2008).
Introduction
urthermore, other authors in previous CULT publications have published exten-

F
sively on the use of corpora in language learning, B-A/A-B translation and ter-
minology for translation trainees (Aston 2000; Bernardini 2000; Zanettin 2001;
Varantola 2003; Kübler 2003; Maia 2003), to give but a few references.
European universities are facing the challenges of the Bologna reform and
many are still searching for a balance between training and education. It would
be an advantage if some kind of consensus could be reached about this balance
as one of the goals of this reform is to promote comparability and compatibility
amongst the universities and thus facilitate mobility. The European credit sys-
tem is one of the most important tools being used to make the European Higher
Education Area a reality. Translation faculties are usually ahead in mobility pro-
grammes and over the years have made good use of the different student and
teacher mobility schemes offered by the EU, so student exchanges are the norm
rather than the exception.
Another aspect of this reform that is reflected in the credit system has more
profound educational implications. This is the description of the credit in terms
of students’ activities and the competences that they acquire. The emphasis is very
much on giving students more autonomy and responsibility for developing ap-
propriate competences. In the case of translation competence, the sub-compe-
tences required involve mainly procedural rather than declarative knowledge,
learning to use strategies and methodologies. One of the advantages of CULT is
that the teacher is no longer the sole source of information and authority, the only
specialist available to the students. Corpus methodology reinforces autonomy
and responsibility.
We have mentioned some of the key concepts of the European Higher Edu-
cation Area: comparability and compatibility in order to facilitate mobility and
increased student autonomy and responsibility. Other key concepts in the Bo-
logna recommendations are employability and competitiveness. Certainly, the
translator’s ability to produce high quality translations will be essential to get and
keep a job, and some of the so-called transversal competences (work habits, team
and leadership skills) are also important. However, to be competitive in the 21st
century translators cannot do without information technology skills.
CULT can reinforce basic IT skills and introduce others. For example, in spe-
cialised translation courses, learning corpus use to translate can also be used to
. The PACTE group (Process in the Acquisition of Translation Competence and Evalua-
tion), which has been conducting an empirical research on translation competence (TC) and
its components since 1997, has put forward a TC model in which “translation competence is
the underlying system of knowledge needed to translate. It includes declarative and procedural
knowledge, but the procedural knowledge is predominant” (PACTE 2003: 43–66).
integrate other kinds of declarative and procedural knowledge needed for trans-
lation competence, such as field specific knowledge of specialised genres, docu-
mentation, terminology, IT and translator tools. Of course, all these can be taught
as separate subjects, but it is probably more efficient to teach them as part of a
translation task in a specialised translation course. Furthermore, integrating the
different kinds of knowledge to solve problems should encourage critical thinking
and help teachers to find the right balance between education and training. The
choice of the methodology to be used will depend on the objectives of the teach-
ing module, the field and the kinds of corpora used.
As was mentioned above, the contributions to this volume fall into two main
sections that are reflected in the subtitle of the book: Corpus use for learning to
translate and learning corpus use to translate.
The first part, Corpus use for learning to translate, or corpora as a source of ma-
terials for translator training, is the least controversial. This is a methodology that
has been used widely in language teaching and the corpora are selected and con-
trolled by the teacher to provide real life examples and exercises. The time spent
learning corpus use is invested by the teacher, who then has a marvellous tool with
which to produce teaching materials that can be used for very specific learning
tasks directed at the needs of a particular group of students. Student-centred teach-
ing has obvious pedagogical advantages. Depending on the nature of the tasks,
the students’ learning can be deductive or inductive and they can see that there
are other sources of authority apart from the teacher’s ‘intuition’. It is true that this
use of corpora to develop teaching materials is well established for learning about
certain aspects of translation related to contrastive language or terminology. The
first chapter in this section falls into this category. However, corpus methodol-
ogy can also be used to prepare classroom materials designed to raise awareness
about more complex, or lesser known phenomena, for example, semantic prosody
in Chapter 2 and explicitation as a translation universal in Chapter 3.
In the first chapter, ‘Using corpora and retrieval software as a source of ma-
terials for the translation classroom’, Josep Marco and Heike Van Lawick provide
a useful introduction to teachers wanting to begin to work in this field. The au-
thors offer a brief review of the origins of corpus-related resources in translator
training and the distinction between corpus-based and corpus-driven learn-
ing. In the first case, teachers select material from corpora to design classroom
materials for specific objectives. In the second case, students have access to an
enormous range of language data, but they have to learn how to use this data for
autonomous learning.
. See: Beeby (1996), Hurtado (1999), Kiraly (2000), González Davies (2004).
Introduction
Within a task-based methodological framework, the authors present four

tasks based on different types of corpora and designed to illustrate contrastive ob-
jectives. The tasks are for novice translators doing non-specialised B-A translation
(translation from a foreign language into the language of habitual use). The first
three tasks are relatively simple corpus-based cloze and multiple choice exercises,
but the last suggests how the students can be guided towards semi-autonomous
or fully-autonomous corpus-driven activities.
Chapter 2, ‘Safeguarding the lexicogrammatical environment: Translating se-
mantic prosody’ by Dominic Stewart focuses on semantic prosody in translation
training, a serious problem in which the use of corpus has an important contribu-
tion to make. Stewart reviews studies on semantic prosody in corpus linguistics
and in translation and the role of semantic prosody in the translation process. To
illustrate the problem, he provides data on an awareness raising module taught to
final year Italian students at the School of Translation and Interpreters in Forlì.
They were asked to translate James Joyce’s The Dead before classroom discussion
of the notion of semantic prosody with assistance of concordance data from the
British National Corpus related to three phrases previously selected by the author.
After the discussion in class, the students were asked to revise their translations.
The comparison of the first and second versions seemed to show greater aware-
ness of semantic prosody, but Stewart questions his own methodology, the way
the corpus analysis was carried out (his own intuitions that led to the searches,
etc.) and recommends a ‘cult of caution within CULT’ as to the empirical objec-
tivity of using corpora.
The third chapter, ‘Are translations longer than source texts? A corpus-based
study of explicitation’ is by Ana Frankenberg-Garcia. The author here focuses on a
universal of translation, voluntary explicitation, and differences in text length be-
tween source texts and their translations, making use of a parallel corpus (original
and translated texts) in English and Portuguese. In the search for data, the au-
thor discusses the problems of comparing word, character and morpheme counts
across languages. The data obtained from the English original and translated texts
from the corpus are crossed with those obtained from the Portuguese original and
translated texts in order to rule out the possibility of a set of texts being shorter or
longer than the other due to language-specific differences. The results extracted
from the sample show a general tendency for translations to be longer than origi-
nal texts. The author suggests that translators-to-be should be made aware of phe-
nomena such as explicitation in translation and its possible causes because this
awareness will improve their decision-making in the translation process.
Chapter 4, ‘Arriving at equivalence. Making a case for comparable general
reference corpora in Translation Studies’, by Gill Philip, defends the use of com-
parable reference corpora to help trainee translators identify the effect of creative
and idiosyncratic language in the source text and produce translation equivalence
in the target text. According to Philip, parallel corpora are usually neither large
nor wide-ranging enough to be able to provide much information on generalised
norms within the languages involved. Philip bases her conclusions on a corpus-
driven study of connotation in non-literary language where she examines the
meaning of colour words in conventional expressions such as to see red, to feel
blue, and green with envy, and explains what factors are responsible for activating
the connotative meanings of the colour words when the expressions are used in
running text.
The second part of this volume, dedicated to Learning corpus use to translate,
is perhaps more obviously related to the issues raised by the Bologna reform as
the authors of the three chapters in this section are involved in designing teach-
ing modules using the European credit system at both undergraduate and post-
graduate level. They belong to a new generation of translator trainers who grew
up using computers, have degrees in translating and interpreting and experience
as professional translators. Corpas and Seghiri address the problem of evaluat-
ing representativeness of corpora built as documentation resources. Sánchez-Gi-
jón suggests a CULT-based methodology to integrate learning documentation,
corpus linguistics, and terminology in specialised translation courses. Rodríguez
Inés offers a methodology for evaluating this learning process.
Chapter 5, ‘Virtual corpora as documentation resources: Translating travel
insurance documents’, is by Gloria Corpas Pastor and Miriam Seghiri Domín-
guez. The authors stress the importance of documentation as a core subject in the
curriculum of Translation and Interpreting degrees, present a brief introduction
to the literature on corpus compilation and go on to provide a systematic meth-
odology for corpus compilation based on electronic resources available on the
Internet. The authors also describe their own software application, ReCor, which
enables accurate evaluation of corpus representativeness by measuring lexical
density (the relation between types and tokens, i.e. the number of different words
in a text and the total number of words). The corpus is representative if the lexi-
cal density does not alter when more texts are added. The protocol and ReCor are
illustrated through the example of the creation of a virtual corpus of travel insur-
ance documents in English and Spanish, which is later tested for representative-
ness. Finally, the pedagogical applications of this research are stressed and some
specific examples are given of possible uses in B-A/A-B translations of travel in-
surance documents.
In Chapter 6, ‘Developing documentation skills to build do-it-yourself corpo-
ra in the specialised translation course’, Pilar Sánchez-Gijón defends the use of do-
it-yourself corpora in the specialised translation class with a proposal for a CULT-
based methodology to integrate documentation, corpus linguistics, terminology
Introduction
and translation skills. She starts with the specialised translator’s needs, the role of
documentation in the curriculum and the advantages of creating do-it-yourself
corpora and improving search strategies to retrieve relevant texts. The example
used to illustrate her proposal includes suggestions not only for solving terminol-
ogy problems, but also textual problems involving the target text reader, contras-
tive rhetoric and the degree of formality required when translating from English
to Spanish.
In Chapter 7, ‘Evaluating the process and not just the product when using
corpora in translator education’, Patricia Rodríguez Inés is also concerned with
the demands of the translation profession and believes that the reforms implicit
in the Bologna Declaration and the European Space for Higher Education (e.g.
promoting curriculum innovation based on learning outcomes, profession-ori-
ented learning objectives, lifelong learning etc.) should help trainee translators to
face these demands successfully: to develop expert knowledge and competences,
to gain autonomy and be able to find strategies to solve new problems using new
technologies. The chapter begins by justifying the theoretical and methodologi-
cal framework chosen for a task-based proposal for teaching the use of electronic
corpora to trainee translators. However, the main contribution is a proposal for
evaluating the learning process, not just the final product, by recognising good
practices, appropriateness, quality and acceptability. The proposed evaluation is
part of a teaching unit for final year undergraduate students, ‘Ingredients for my
corpus: quality texts’. It is designed to build up responsibility and autonomy and
the evaluation includes self-assessment, peer assessment and teacher assessment.
Both aspects of CULT, Corpus use for learning to translate and learning cor-
pus use to translate, are a real possibility in most European translation faculties,
with increasingly sophisticated computers and software specially designed to
make the most of the enormous possibilities of existing corpora and the Inter-
net. However, CULT should always be part of a pedagogically sound syllabus in
which all aspects of education are taken into account. CULT is only one aspect of
a translator’s training and, despite the technological advances, time is needed to
train corpus users in good practices and to give them the knowledge and the tools
to build reliable, representative corpora. We think that the time is well spent and
hope that this book will encourage ‘novice’ CULT teachers to experiment as well
as suggest a few new ideas to the ‘experts’.
References
Aston, G. 1999. ‘Corpus Use and Learning to Translate’. Textus XII: 2 (special issue: “Translation
Studies Revisited”): 289–314.
Aston, G. 2000. “Corpora and language teaching”. In Rethinking language pedagogy from a cor-
pus perspective, L. Burnard and T. McEnery (eds), 7–17. Bern: Peter Lang.
Baker, M. 1993. ‘Corpus Linguistics and Translation Studies – Implications and Applications’.
In Text and Technology. In Honour of John Sinclair, Mona Baker, Gill Francis and Elena
Tognini-Bonelli (eds), 233–252. Amsterdam/Philadelphia: John Benjamins.
Beeby, A. 1996. Teaching Translation from Spanish to English. Ottawa: Ottawa University Press.
Bernardini, S. 2000a. “Systematising serendipity: Proposals for concordancing large corpora
with language learners”. In Rethinking Language Pedagogy from a Corpus Perspective,
L. Burnard and T. McEnery (eds), 225–234. Bern: Peter Lang.
Bernardini, S. and Zanettin, F. 2000. I corpora nella Didattica della Traduzione – Corpus Use and
Learning to Translate. Bologna: CLUEB.
Corpas Pastor, G. 2001. “Compilación de un corpus ad hoc para la enseñanza de la traduc-
ción inversa especializada”, Trans 5: 155–184. (Also available at: http://www.trans.uma.
es/Trans_5/t5_155-184_GCorpas.pdf)
González Davies, M. 2004. Multiple Voices in the Translation Classroom. Amsterdam/Philadel-
phia: John Benjamins.
Gouadec, D. 2002. Profession: Traducteur. Paris: La Maison du Dictionnaire.
Gouadec, D. 2007. Translation as a Profession. Amsterdam/Philadelphia: John Benjamins.
Hurtado Albir, A. (Dir.) 1999. Enseñar a traducir. Madrid: Edelsa.
Kiraly, D. C. 2000. A Social Constructivist Approach to Translator Education; Empowering the
Translator. Manchester: St. Jerome.
Kübler, N. 2003. “Corpora and LSP translation”. In Corpora in translator education, F. Zanettin,
S. Bernardini, D. Stewart (eds), 25–42. Manchester: St. Jerome.
Maia, B. 2003. ‘Training Translators in Terminology and Information Retrieval using Compa-
rable and Parallel Corpora’. In Corpora in Translator Education, F. Zanettin, S. Bernardini
& D. Stewart, 43–54. Manchester: St. Jerome.
Mossop, B. 1998. “The workplace procedures of professional translators”. Paper read at the EST
Conference in Granada.
PACTE. 2003. “Building a Translation Competence Model”. In Triangulating Translation: Pers-
pectives in process oriented research, F. Alves (ed.), 43–66. Amsterdam: John Benjamins.
Rodríguez Inés, P. 2008. Uso de corpus electrónicos en la formación de traductores (inglés-es-
pañol-inglés). PhD thesis. Departament de Traducció i d’Interpretació. Universitat Au-
tònoma de Barcelona.
Stewart, D. 2001. “Poor Relations and Black Sheep in Translation Studies”. Target 12(2): 205–
228.
Varantola, K. 2003. “Translators and disposable corpora”. In Corpora in translator education,
S. Bernardini, D. Stewart, F. Zanettin (eds), 55–70. Manchester: St. Jerome.
Zanettin, F. 2001. “Swimming in words: Corpora, translation, and language learning”. In Learn-
ing with corpora, G. Aston (ed.), 177–197. Bolonia: CLUEB.
Zanettin, F., Bernardini, S. and Stewart, D. 2003. Corpora in Translator Education. Manchester:
St. Jerome.
Using corpora and retrieval software
as a source of materials
for the translation classroom
Josep Marco and Heike van Lawick

Universitat Jaume I / Castelló de la Plana, Spain
This article starts from a twofold distinction: that between corpora as documen-
tation tools and corpora as a source of materials for the translation classroom,
and that between corpus-based and corpus-driven approaches. Then a pedagog-
ic framework for translator training is outlined in which the notion of objective
is central and a task-based methodology is used. Within such a framework, four
kinds of corpus-related tasks are presented and illustrated: cloze tests based on
a bilingual corpus, multiple choice exercises based on a learner corpus, transla-
tion of short passages yielded by the concordancer and concordance analysis.
The first three are corpus-based, whereas the last one is more corpus-driven and
can be used to promote autonomous learning and discovery strategies.
Key words: Translator training, corpora, task-based approach, corpus-based,

corpus-driven, cloze test, multiple choice exercise, concordancing, COVALT,
autonomous learning
1. The role of corpus-related resources in translator training
According to experts in second language acquisition (Partington 1998: 5–7; Aston

2000: 7), corpora and corpus interrogation tools can be used in two different but
complementary ways in language learning: as a means of autonomous learning,
when used by the student with little or no teacher mediation, and as a source of
materials for classroom use, developed by the teacher, who selects the samples
* Research for this article has been conducted within the framework of two research projects:
HUM2006-11524/FILO, funded by the Spanish Ministry of Science and Innovation (with a
contribution from FEDER funds), and P1 1B2006-13, funded by the ‘Caixa Castelló – Bancaixa’
Foundation, as part of an agreement with the Universitat Jaume I.
10 Josep Marco and Heike van Lawick
and controls their use with a view to achieving their pedagogic objectives. As
claimed by Bernardini, Stewart and Zanettin (2003: 4):
The use of corpora in language learning contexts was pioneered by Tim Johns,
who introduced concordancing into the foreign language classroom in the
1980s. Besides enabling language professionals such as lexicographers and mate-
rial writers to produce better reference and learning materials, and allowing lan-
guage teachers to create classroom activities based on real examples, he showed
how corpora could provide learners with direct access to virtually unlimited
language data.
The same distinction applies to corpus-related resources when used in a transla-

tor training environment. In fact, in the collective volume just quoted both ap-
proaches are represented, though not on an equal basis. Far from it, much more
attention is paid to corpora and corpus interrogation as documentation tools for
the translator trainee than to developing corpus-based classroom materials. Pear-
son (2003), for instance, argues that parallel corpora play a complementary role
to comparable corpora in helping translators solve certain translation problems,
and she illustrates her point by referring to culture-specific information in a col-
lection of popular science articles and their translations. Kübler (2003) claims
that the use of corpora has to find its way into translator training objectives and
methodology, especially when the focus is on terminology. Specialized translators
have long relied on so-called parallel texts (in hard copy) when dealing with ter-
minology-related problems, so we are not talking about anything radically new;
but digitized corpora offer the great advantage of providing the translator with a
wealth of linguistic data on subject-specific terminology at the touch of a button.
Similarly, Varantola (2003) focuses on the use of ad hoc comparable corpora for
specific translation jobs. Since these corpora are compiled for the jobs in hand,
they are used and then disposed of, and are therefore referred to as disposable.
Varantola goes on to claim that “the knowledge of how to compile and use cor-
pora is an essential part of modern translational competence and should therefore
be dealt with in the training of prospective professional translators” (2003: 56).
What these three contributions have in common is that they envisage corpora as
repositories which can help students fill their knowledge gaps.
Even though this first use of corpora is better represented in the literature
generally and, more particularly, in Corpora in Translator Education, examples can
also be found of corpora as a source of classroom materials. Frankenberg-Garcia
and Santos (2003), for instance, illustrate a couple of contrastive features between
English and Portuguese which may give rise to translation problems and which
can be adequately dealt with through activities based on Compara, the Portu-
guese-English Parallel Corpus. Bowker and Bennison (2003) hint at the pedagogic
Corpora as source for the translation classroom 11
potential of learner corpora, which in a translator training environment would be

taken to mean collections of texts produced by students as part of their learning
activity. These authors claim that “[w]ith regard to pedagogy, a corpus of student
translations can provide a means of identifying areas of difficulty that could then
be integrated into the curriculum and discussed in class” (2003: 103), but (unlike
Frankenberg-Garcia and Santos) they do not illustrate their point with specific
translation tasks. Cosme (2006), on the contrary, provides both an overview of
corpus-based translation tasks and specific instances that can be used in class.
Drawing on a parallel bidirectional English-French, French-English corpus of fic-
tion and newspaper texts, this author identifies three kinds of ‘exercises’ – aware-
ness-raising, translation enhancement and production (2006: 97)
There is another distinction which partly overlaps with the one drawn in the
previous paragraphs: that between corpus-based and corpus-driven approaches.
According to Tognini-Bonelli (2001), “the term corpus-based is used to refer to a
methodology that avails itself of the corpus mainly to expound, test or exemplify
theories and descriptions that were formulated before large corpora became avail-
able to inform language study” (2001: 65). In other words, the theory precedes the
data, and the data are mainly used in support of the theory. In the corpus-driven
approach, on the other hand, “[t]he corpus (…) is seen as more than a reposi-
tory of examples to back pre-existing theories or a probabilistic extension to an
already well defined system. The theoretical statements are fully consistent with,
and reflect directly, the evidence provided by the corpus” (2001: 84). When ap-
plied to language learning, this dichotomy implies a difference in focus, either
on the teacher (corpus-based) or the student (corpus-driven). In this respect,
Bernardini (2004) refers to Johns’ work on data-driven learning, which suggests
that “learners should be guided to discover the foreign language, much in the same
way as corpus linguists discover facts of their own language that had previously
gone unnoticed” (2004: 16). Although she identifies with the general principles
guiding Johns’ approach, she makes an important qualification when she sug-
gests that placing the learner on an equal footing with the researcher is perhaps
unrealistic, and goes on to put forward an alternative metaphor: the learner as
traveller who follows their own interests in a process of discovery. This distinction
between corpus-based and corpus-driven learning can also be applied to transla-
tor training, with the only difference that what is out there to be discovered or
somehow apprehended is not a foreign language but several aspects of translator
competence – whether knowledge gap-filling strategies, cross-linguistic features
or techniques used by professional translators in order to solve a problem.
In this paper, we intend to put forward concrete corpus-based and corpus-
driven activities for the translation classroom. The focus will be on so-called gen-
eral – i.e. non-specialized – translation (Hurtado 1999: 99, 2001: 166), or else on
literary translation, with English and German as source languages and Catalan as
target language. Therefore, we will not be dealing with such problems as subject-
specific terminology or specialized genre conventions. However, before present-
ing the activities, let us look briefly at the pedagogic assumptions underlying our
proposal.
2. Pedagogic assumptions: Objectives and methodology
Within the field of translator training, Delisle (1980, 1993, 1998) has laid great
emphasis on the importance of the notion of learning objective when planning a
translation course. Hurtado (1999, 2001) subscribes to Delisle’s view and goes on
to identify four groups of objectives that must inform a general translation course
(2001: 167): methodological, contrastive, professional and textual. Methodologi-
cal objectives have to do with the principles guiding the translation process; con-
trastive principles are related to basic contrastive features between the two lan-
guages involved; professional objectives take account of the skills the prospective
translator needs to have with a view to their insertion into the marketplace, i.e.
their becoming a member of the professional community to which they aspire to
belong;1 finally, textual objectives deal with the kinds of problems that arise in the

negotiation of different text types and, especially, genres.

As can be seen, contrastive objectives continue to be part and parcel of transla-
tor training, even if they seem to have had “a bad press” in the past. As an over-
reaction to the tenets of comparative stylistics, which presented translation as an
operation between languages, not between texts, the emphasis was laid, from the
1980s onwards, on the communicative aspects of translation. But since that shift
of emphasis is fully integrated into the discipline by now, it is perhaps time to
claim a more visible position for contrastive features. It must be remembered that
not even Delisle, who so eloquently criticized the basic assumptions of the com-
parative stylistics approach, banned cross-linguistic contrast from the domain of
translator training, but placed it at the outset (1980: 94) of what he calls a cours
d’initiation.2 And a recent English-Catalan translation handbook (Ainaud, Espunya

1. This is in line with Kiraly’s social constructivist approach to translator education, accord-
ing to which students at the periphery of the translation community are gradually drawn into
the community’s discourse until they are competent, full-fledged members of the community
themselves (Kiraly 2000).
2. As claimed by Kelly (2005: 12), “Delisle’s translational approach is informed by the théorie
du sens, and also partly by the Canadian contrastivist tradition of Vinay and Darbelnet, despite
his criticism of their work”.
and Pujol 2003) devotes a long chapter to “elements of cross-linguistic contrast”.

Moreover, there is no incompatibility between contrastive and communicative or
textual considerations, since linguistic areas where contrast is only too evident are
very apt to accomplish important textual functions, as will be seen later on.
As to methodology, in our courses we follow a task-based approach, which
is arguably very well suited to the objectives listed in the previous section. The
concept of task plays a central role in the teaching methodology put forward by
Hurtado (1999, 2001) and González Davies (2003, 2004).3 The task-based meth-

odology is rooted in the communicative approach to second language learning,

born in the 1970s. In the development of communicative competence – the lead-
ing concern of second language acquisition – a task-based approach confers the
main role on the student, who is expected to produce utterances in a communica-
tive situation which is simulated but modelled on a real situation of the second
language culture. Similarly, in the translation classroom it would be the teacher’s
role to plan and carry out tasks leading progressively to the achievement of a
given learning objective. Tasks are grouped into teaching units and teaching units
are closely connected with one or more learning objectives. Curriculum design is,
therefore, hierarchically conceived and its components are interdependent: from
well-defined objectives to teaching units embodying those objectives to specific
tasks leading by stages to the achievement of a given objective.
3. Designing corpus-related translation tasks
Translator trainers can draw on various corpora in order to elaborate translation

tasks. Different corpora lend themselves to different kinds of pedagogic exploita-
tion, depending on whether they are monolingual or multilingual, comparable or
parallel, general or subject-specific, etc. Without aiming at being exhaustive, the
following four kinds of tasks are envisaged: cloze tests, multiple choice exercises,
translation of short passages yielded by a concordancer and concordance analysis.
3.1 Cloze tests based on a bilingual corpus
Students are provided with the source text and the target text with gaps that they
are asked to fill in (see, for instance, Frankenberg-Garcia and Santos 2003). These
gaps will be concerned with the problematic issue that the teacher wants the stu-
dents to deal with. The main advantage of this kind of exercise is that it allows the
3. On this point, see also Kelly (2005: 16–17).

class to focus on a specific translation problem, leaving aside all other aspects of a
text which, interesting as they may be, are perceived at a given moment as periph-
eral to the issue in hand. Cloze tests, needless to say, can never be the main kind
of activity carried out in a translation class, as their nature is obviously reduction-
istic; but this weakness becomes their main strength when they are regarded as a
task which enables the class to concentrate on a given translation problem in an
intensive way (see Lawick 2006).
Appendix 1 provides two tasks dealing with the German conjunctions4 als
and wenn. Given the semantic complexity of these conjunctions, in the first task
students are asked to identify their different values and functions in different con-
texts. Samples have been selected from the monolingual German corpus Wort
schatz-Portal,5 compiled at Leipzig University and representing real situations

of use of today’s German. This large on-line corpus can be easily accessed and
handled, allowing students to obtain linguistic information without employing
much time and effort and encouraging them to work autonomously and discover
by themselves that corpora offer more (and different) information than dictionar-
ies. Therefore, they are asked to look for further examples in that corpus, before
carrying out the cloze test.
The central task presents a cloze test (with Joseph Roth’s Die Flucht ohne Ende
as source text and its Catalan translation, La fugida sense fi, as target text, both
belonging to the COVALT corpus)6 in which students are expected to fill in the

gaps in the target text corresponding to the clauses introduced by als and wenn in
the source text. Thus, learners will apply what they have learned in the previous
4. Although generally the terms connective and conjunction are used as synonyms in current
grammars, we prefer the latter according to the criterion followed by the Duden (2001), where
Konjunktion or conjunction is used meaning a lexical element connecting clauses, phrases,
words or constituents of a phrase or a word.
5. The Wortschatz-Portal <http://wortschatz.uni-leipzig.de/> contains 6 million words and
offers not only a list of concordances, but also information on significant neighbours and
graphical representations of co-occurrences.
6. COVALT (Corpus Valenciano de Literatura Traducida, or “Valencian Corpus of Translated
Literature”) is a multilingual corpus – still under construction – made up of the translations into
Catalan of narrative works originally written in English, French and German published in the
autonomous region of Valencia from 1990 to 2000. It currently includes 70 pairs of source text
+ target text which amount to about 4 million words. Corpus analysis is carried out by means of
AlfraCOVALT, a bilingual concordancing programme developed within the COVALT research
group by Josep Guzman (see Guzman, forthcoming). The COVALT group, based at Universitat
Jaume I (Castelló, Spain), has received financial support from several research projects, funded
by the “Caixa Castelló-Bancaixa” Foundation (within an agreement with Universitat Jaume I),
and by the Spanish Ministry of Science and Technology.
exercise, with the help of the translated context. But let us see why we consider
these conjunctions as a translation problem.
The conjunctions als and wenn have been singled out for study because stu-
dents, unaware of their polysemy, tend to translate them automatically into Cata-
lan as quan (“when”) and si (“if ”), respectively. In fact, Ainaud, Espunya and Pujol
devote a section (2003: 198–201) of their English-Catalan translation handbook
to the problems arising in the translation of connectives, starting with their often
polysemous character. But the German conjunctions als and wenn pose problems
to translator trainees not only on the grounds of their polysemous character, but
also of their partial overlap in temporal meaning. Actually, this overlap results
in pragmatic ambiguity, insofar as only the context can determine which mean-
ing prevails in each case, a phenomenon closely related to polysemy, linguistic
change, and grammaticalization (Sweetser 1990). In what follows, a brief descrip-
tion is provided of the main functions of the German conjunctions als and wenn,
as a backdrop to the tasks in Appendix 1.
The polysemy of als lies mainly in its double role as a temporal and as a modal
conjunction. In its latter function it may introduce a subordinate clause, a part of
a sentence or a word; furthermore, it may occur in complex expressions (sowohl...
als auch; insofern, als; zu... als dass). In most cases the modal als introduces a
comparison or a specification.
In its temporal sense, als usually introduces a subordinate clause referring to
events that (a) occurred in the past simultaneously with the events expressed in
the main clause, indicating a certain point in time; (b) occurred in the past before
the events expressed in the main clause, and c) occured in the past after the events
expressed in the main clause.
The main values of wenn are time and condition, two values that are closely in-
terrelated (Drosdowski et al. 1984: 700), not only in German. The use of temporal
notions like before and after in order to define more abstract notions like cause and
effect has been observed by several scholars (Drosdowski et al. 1984: 697; Cuenca
1992–1993 and 1999: 173; Pérez Saldanya and Salvador 1995: 91). The conjunction
nachdem, for example, has a temporal and a causal meaning, although the latter is
no longer used in standard German. For Catalan, Salvador (2002: 2989) highlights
the semantic proximity between certain causal and conditional clauses, on the one
hand, and between the latter and temporal clauses, on the other.7
The cognitive approach to polysemous phenomena enables us to see this kind

of evolutions as generally occurring in different languages, that is, it may account
7. According to this author, the utterance Quan neva, la muntanya torna blanca (“When
it snows, the mountain turns white”) is equivalent to a generic interpretation of Si neva, la
muntanya torna blanca (“If it snows, the mountain turns white”) (Salvador 2002, Note 8).
for the change (or extension) from a temporal to a conditional value in the case of
the German conjunction wenn, as well as for the fact that the Catalan contextual
equivalent of wenn in its conditional meaning is often the temporal conjunction
quan. But the temporal sense of als is also normally translated as quan, which
contributes to make this kind of translation problem more complex. A further
value of wenn (usually in combination with auch) is concessivity, corresponding
to the Catalan encara que (“even if ”, “even though”).8 The relationship between all

these values has been previously worked out in class by using the Wortschatz-cor-
pus, thus enabling students to easily cope with the cloze test, which is intended to
ensure comprehension and allow them to deliberately solve this kind of transla-
tion problem. On the other hand, given the context, they are encouraged to look
for translations which they might not have thought of before.
3.2 Multiple choice exercises based on a learner corpus
Learner corpora (see, for instance, Aston 2000; Osborne 2000; Bernardini 2004
for the use of learner corpora in second language acquisition, or Bowker and Ben-
nison 2003, mentioned above, for learner corpora and translator training) can
be used to implement multiple choice exercises. Students are provided with the
source text and then some fragments from it and their corresponding translated
fragments from different translations; then they are asked to distinguish between
translations that are both correct and adequate, translations that are incorrect
and translations that are more or less correct but inadequate, for whatever rea-
sons. The rationale behind this is, as Pym (1992) suggests, that in translation it is
sometimes possible to tell right from wrong, but more often than not translation
“errors” are inadequacies rather than plain mistakes. The former are called binary
errors (it can be said that something is a mistake on the authority of grammar and
the dictionary); the latter are known as non-binary errors.
Appendix 2 illustrates this possibility in corpus-based task design. The origi-
nal excerpts are taken from Graham Swift’s novel Last Orders (Swift 1996), and the
translated excerpts from an ad hoc learner corpus made up of student translations
of an extended passage (about 300 words) of the novel. Of course the task, as
presented in the appendix, is not complete. For reasons of space, the extended
8. Salvador (2002: 2980) calls this kind of utterances condicionals concessives (“concessive con-
ditionals”) (b) situated between condicionals (“conditionals”) (a) and concessives pures (“pure
concessives”) (c): a) Si fa sol, l’excursió serà molt agradable (“If it’s sunny, the trip will be very
pleasant”); b) Encara que no faci sol, l’excursió serà molt agradable (“Even if it’s not sunny, the
trip will be very pleasant”); c) Tot i que no ha fet sol, l’excursió ha estat molt agradable (“Even
though it hasn’t been sunny, the trip has been very pleasant”).
passage in question is not included, as a result of which the short excerpts and
its multiple-choice translations are absolutely decontextualized. In the real class-
room situation where the task was carried out (within the framework of a literary
translation course at Universitat Jaume I), students were familiar with the novel
and, therefore, with the extended passage. This kind of activity is intended to en-
hance the translator trainee’s critical sense, as it forces them to consider different
translation possibilities and give each one of them its due, always in a reasoned
way. In this particular case, the focus is on questions of register and phraseology,
as the main characters in the novel are working-class Londoners whose speech –
rich in colloquial expressions and idiomatic turns of phrase – gives the text its
distinctive flavour.
3.3 Translation of short passages yielded by the concordancer
The teacher selects different passages among the results yielded by the concor-
dancer for a given query and students are asked to translate them. The advantage
of this kind of exercise is again that it enables the class to concentrate on a spe-
cific translation problem through a battery of examples which the trainer deems
representative of the issue in hand. It would have been possible to implement this
kind of activity in the pre-corpus age, but then it would have been extremely time-
consuming to collect relevant examples manually.
Appendix 3 provides an example of this kind of “translation drill”, centred
upon the conjunctive meaning of English now. Samples are taken from the British
National Corpus, freely accessible on-line through SARA (a concordancing tool).
However, translation of (a selection of) query matches will be supplemented by
a preparatory task (identification of the different values and meanings of English
now, as in the task included in Appendix 1) and a final task (translation of a lon-
ger passage in which now, used as a discourse marker, can be seen at work in a
wider perspective). Thus the three-step progression (identification of the different
values and meanings of an adverb/connector ⇒ translation of at least one short
passage for each meaning identified ⇒ translation of a longer passage in which
one meaning of the adverb/connector is illustrated on a larger scale) allows the
translation class to move from the paradigmatic (possible values of a given item
as they would be found in a corpus-based dictionary) to the syntagmatic axis
(how the item in question is embedded in a given text, how it contributes to the
development of the text’s argumentative or expositive patterns, etc.) and there-
fore bridges the gap between cross-linguistic contrast and meaning or function in
context. Moreover, this textual justification ties in particularly well with a cogni-
tive one, as the progression envisaged allows an interplay between analysis (what
different meanings of now can you identify?) and synthesis (decide which mean-
ing is activated in this instance and translate in a contextually appropriate way).
As argued by Danielsson (Bernardini 2004: 18), “as the units […] get longer on
the syntagmatic scale, the paradigmatic choices tend to get fewer”.
The development of translation tasks focusing on different values and functions
of now was prompted by our awareness of the difficulties experienced by translator
trainees when faced with now as a discourse marker. Students are well acquainted
with the primary meaning of now, i.e. an adverb with present time reference, but
they often seem to be utterly unacquainted with the other uses of the word, which
results in incoherent renderings that are bound to distort the import of a given
argument. Therefore, the aim of the tasks in Appendix 3 is to make students aware
of the uses of now as a discourse marker, which (according to the 1995 edition of
the Collins Cobuild English Dictionary) could be paraphrased as follows:
a. “to indicate to the person or people you are with that you want their attention,
or that you are about to change the subject”;
b. “when they are thinking of what to say next”;
c. “to give a slight emphasis to a request or command”;
d. “to introduce information which is relevant to the part of the story or ac-
count that you have reached, and which needs to be known before you can
continue”;
e. “to introduce something which contrasts with what you have just said be-
fore”;
f. “You can say ‘now, now’ as a friendly way of trying to comfort someone who
is upset or distressed”.
All these values of now are identified by the Collins Cobuild English Dictionary as
belonging to pragmatics and as being typical of spoken and/or informal English,
even though some of the examples included in the tasks are taken from written
language. These examples illustrate the time adverb use of now and all its non-
temporal uses just mentioned except b and c. These uses sometimes overlap, but
they are all present to some degree in the corpus samples provided.
3.4 Concordance analysis
All of the tasks put forward so far (those included in Appendixes 1, 2 and 3) are
corpus-based, insofar as they deal with contrastive features and translation prob-
lems identified by the teacher and then incorporated into the classroom materials.
Their novelty lies in the fact that corpora (whether monolingual, parallel or learn-
er) and corpus interrogation tools are used, but they might equally have been
elaborated in a more traditional way, with hard copy support – only the whole
thing would have been maddeningly time-consuming. But, as suggested above,
translation tasks may also be corpus-driven, with a varying amount of guidance
on the teacher’s part. Let us consider some examples.
A parallel corpus such as COVALT lends itself to data-driven exploitation in
several ways. The most generic one would be the teacher encouraging students to
use it as a documentation tool for any equivalence-related problem not broached
by the bilingual dictionary. But the scope for autonomous research and – indeed –
discovery can be narrowed and made more specific by the teacher suggesting rich
points or problematic areas. In this respect, the learner may be prompted to dis-
cover what techniques are used by translators when faced with words or multi-
word strings which have no readily available equivalents, or what norms govern
translation decisions with regard to such problems as the translation of sub-stan-
dard forms and cultural elements.
As to the former, a case in point might be that of evaluative adjectives (Marco
2006), i.e. adjectives that convey a certain degree of evaluation on the speaker’s part
and therefore reveal their attitude towards something. These adjectives often have
a semantic spectrum which overlaps only partially with the semantic spectrum of
their closest equivalents in the receiving language. That would be the case of Eng-
lish adjectives grotty, chintzy or wan, for instance, and their Catalan or Spanish dic-
tionary equivalents. Another case in point is that of body language words (Marco
and Guzman 2007), i.e. words – normally verbs or nouns – that convey some sort
of bodily expression, such as stare, frown, shrug, sniff or gasp. Two aspects of this set
of lexical units become immediately apparent when regarded from a contrastive
viewpoint. Firstly, they show different degrees of lexicalization across languages:
whereas in English these actions are fully lexicalized, in Catalan they are conveyed
by means of more or less fixed collocations of the type arrufar les celles (“to wrinkle
one’s eyebrows”) or arronsar els muscles (“to contract one’s shoulders”). And sec-
ondly, the cultural import of the gestures or body motions differs. The results that
this kind of discovery procedures are likely to yield would undoubtedly go a long
way towards enhancing the learner’s awareness of translators’ creativity, as actual
translation solutions are often difficult to predict from the standpoint of dictionary
equivalents, and show a high degree of variation.
Let us look briefly at the example provided by the English adjective dull. The
main meanings of dull are paraphrased as follows by the Collins Cobuild English
Dictionary: “not interesting or exciting”; “not very lively or energetic”; “not bright”
(when applied to colours or light); “very cloudy” (when applied to the weath-
er); “not very clear or loud” (when applied to sounds); “weak and not intense”
(when applied to feelings); and “not sharp” (when applied to a knife or blade).
These meanings are variously reflected in the English-Catalan dictionary, and the
e quivalents provided by the dictionary typically find their way into the transla-
tion solutions yielded by the COVALT corpus. However, not all these solutions
had been foreseen by the dictionary, and it is precisely in that area of mismatch
between dictionary and actual practice (as witnessed by corpus query matches)
that the translator’s creativity can be seen at work. Here are some concordance
results that convey a certain degree of unpredictability on the basis of dictionary
information (back translations from Catalan are provided in brackets).
(1) De Quincey, Thomas. The Confessions of an English Opium-Eater / Confessions
d’un opiòman anglés.
should become an opium-eater, haguera acabat menjant Opi, probable-
the probability is that (if he is not ment somniaria bous (llevat que, de tan
too DULL to dream at all) he will soca, no somiara en absolut) (“unless
dream about oxen he is such a blockhead that he does not
dream at all“)
(2) Doyle, Arthur Conan. The Adventure of the Bruce-Partington Plans / Sherlock
Holmes i els plànols del Bruce Partington
The London criminal is certainly a El criminal de Londres és, ben segur,
DULL fellow, un individu poc espavilat (“not a very
sharp individual”)
I am DULL indeed not to have Sóc un vertader badoc per no haver-ne
understood its possibilities comprés les possibilitats (“I am a real
fool for not having understood the pos-
sibilities”)
(3) Conrad, Joseph. Typhoon / Tifó
give me the DULLest ass for a skip- , doneu-me per patró l’ase més ase abans
per before a rogue que un bergant (“give me the most fool-
ish ass for a skipper before a rogue”)
As to the second aspect mentioned above – norms governing translation decisions

with regard to specific translation problems – corpus evidence can be used in the
classroom to lend support (or otherwise) to commonly held assumptions, norms
or even universals. One norm often assumed to operate in Catalan translation is
the tendency to neutralize in the target text elements indicating dialectal origin,
vulgar register or sub-standard character in general in the source text. One pos-
sible way of tracing this norm would be to search for occurrences of the common
sub-standard auxiliary form ain’t in the parallel corpus. In the following query
matches for ain’t found by the COVALT concordancing tool, AlfraCOVALT (see
Guzman and Serrano 2006; Guzman, forthcoming), all the translated fragments
are perfectly standard Catalan – and they are representative of the English-Cata-
lan sub-corpus at large:
(4) London, Jack. White Fang / Clau blanc
But we AIN’T got people an’ money Perquè nosaltres no tenim família, ni
an’ all the rest, like him, diners, ni res de res, com tenia ell
They jes’ know we AIN’T loaded to Saben ben bé que no estem preparats
kill, per a exterminar-los
An’ I’ll bet it AIN’T far from five I em jugaria el coll que fa més de metre
feet long i mig de llargària
(5) London, Jack. The Cruise of the Dazzler / El creuer del Dazzler
I’m ‘Reddy’ Simpson, an’ you AIN’T Jo sóc Panotxa Simpson i no hauràs
licked the fambly till you’ve licked guanyat la família fins que no em der-
me rotes
Bli’ me, if ‘ere they AIN’T snoozin’, – Que em pengen si no estan roncant
(6) London, Jack. The Call of the Wild / La crida del bosc
You AIN’T going to take him out No deu voler soltar-lo ara, veritat
now
In the area of cultural elements, on the other hand, it might be interesting to find
out how street names, for instance, fare in translation. In English-Catalan transla-
tion, there is no generally agreed-upon way of dealing with such nouns as Street,
Avenue, Road, etc. when they are part of a proper noun; therefore they are some-
times translated (as carrer, avinguda and so on) and sometimes left untouched in
the target text. However it would be highly informative to determine quantita-
tively whether there is a prevailing tendency in contemporary translation practice
or not. This is just to illustrate how corpus evidence can help throw light on some
contrastive features and translation problems. Of course many more areas of dif-
ficulty could be illuminated in a similar way.
Finally, learner corpora might also be used in an inductive way as part of
comparable corpora including professional work. This would parallel the kind of
work envisaged by Bernardini (2004) for the language learner. According to this
author (2004: 19), who draws on previous work done in the area of Computer As-
sisted Language Learning,
if learners are presented with concordances showing the typical errors they (sta-
tistically) appear to make, and with similar textual environments where the same
structure is used appropriately, they may find it easier to become aware of more
or less fossilized characteristics of their interlanguage, thus potentially initiating
a process of knowledge restructuring.
It would be possible, along the same lines, to present the translator trainee with a
comparable corpus made up of two components: a learner component, consist-
ing of translations produced by trainees in an academic setting, and a profes-
sional component, consisting of published translations, i.e. the output of profes-
sional work. Such a corpus would be comparable in its target text component, as it
would place side by side the production of learners and professionals; but it would
be rendered still more useful if it was parallel as well, that is to say, by means of
the inclusion of the corresponding source texts. As argued by Bernardini for the
language learner, the translator trainee’s awareness of typical mistakes or errors
might very well be enhanced by comparison of professional and learner output.
4. Conclusion
The obvious conclusion to be drawn from this article is that translator training
can benefit a great deal from what corpora and corpus analysis have to offer. The
emphasis has been on corpus use and learning to translate (as opposed to learning
corpus use to translate). Furthermore, corpus resources have been envisaged –
mainly but not exclusively – as sources for classroom materials and activities rather
than documentation tools. Emphasis on the former is due to the fact that the latter
has attracted more attention so far, and we have sought to redress the balance as
far as possible with corpus-based tasks for the translation classroom; but we have
also pointed out ways in which corpus-driven activities could be implemented
(either inside or outside the class). Cloze tests, multiple choice exercises based on
learner corpora and translation of short fragments yielded by concordancers are
examples of corpus-based activities; concordance analysis based on COVALT, for
instance, can give rise to semi-autonomous or fully autonomous corpus-driven
tasks, only partly (if at all) guided by the teacher, focusing on particular problems
or sets of problems. The more highly skilled the student is in the use of corpora,
the more autonomous their work can be. However, this sort of skill – like many
others – is not acquired overnight, and it makes sense to move along the two axes
(corpus-based and corpus-driven work) simultaneously, so that the student can
gradually shift from the former to the latter. Only a certain degree of competence
on the trainee’s part can justify the claim that “[t]he greatest pedagogic value of
the instrument [i.e. corpora] lies, we suggest, in its thought-provoking, rather than
question-answering, potential” (Bernardini, Stewart and Zanettin 2003: 11).
References
Ainaud, J., Espunya, A. and Pujol, D. 2003. Manual de traducció anglès-català. Vic: Eumo.
Aston, G. 2000. “Corpora and language teaching”. In Rethinking Language Pedagogy from a
Corpus Perspective, L. Burnard and T. McEnery (eds.), 7–17. Frankfurt: Peter Lang.
Bernardini, S., Stewart, D. and Zanettin, F. 2003. “Corpora in Translator Education: An Intro-
duction”. In Corpora in Translator Education, F. Zanettin, S. Bernardini and D. Stewart
(eds.), 1–13. Manchester: St. Jerome.
Bernardini, S. 2004. “Corpora in the classroom: An overview and some reflections on future
developments”. In How to Use Corpora in Language Teaching, J. M. Sinclair (ed.), 15–36.
Amsterdam/Philadelphia: John Benjamins.
Bowker, L. and Bennison, P. 2003. “Student Translation Archive and Student Translation Track-
ing System: Design, Development and Application”. In Corpora in Translator Education,
F. Zanettin, S. Bernardini and D. Stewart (eds.), 103–117. Manchester: St. Jerome.
Collins Cobuild English Dictionary. 1995 [2nd edition]. London: HarperCollins.
Cosme, C. 2006. “Clause combining across languages. A corpus-based study of English-French
translation shifts”. Languages in Contrast 6(1): 71–108.
Cuenca, M. J. 1992–1993. “Sobre l’evolució dels nexes conjuntius en català”. Llengua i Literatura
5: 171–213.
Cuenca, M. J. 1999. Introducción a la lingüística cognitiva. Barcelona: Ariel.
Delisle, J. 1980. L’analyse du discours comme méthode de traduction. Ottawa: Editions de
l’Université d’Ottawa.
Delisle, J. 1993. La traduction raisonnée. Manuel d’initiation à la traduction professionnelle de
l’anglais vers le français. Ottawa: Editions de l’Université d’Ottawa.
Delisle, J. 1998. “Définition, rédaction et utilité des objectifs d’apprentissage en enseignement
de la traduction”. In Los estudios de traducción: un reto didáctico, I. García Izquierdo and
J. Verdegal (eds.), 13–43. Castelló: Servei de Publicacions de la Universitat Jaume I.
Deutscher Wortschatz, Universität Leipzig, Institut für Informatik: http://wortschatz.
uni-leipzig.de/.
Drosdowski, G. et al. (eds.). 1984. Duden. Grammatik der deutschen Gegenwartssprache.
Mannheim: Bibliographisches Institut.
Frankenberg-Garcia, A. and Santos, D. 2003. “Introducing Compara, the Portuguese-Eng-
lish Parallel Corpus”. In Corpora in Translator Education, F. Zanettin, S. Bernardini and
D. Stewart (eds.), 71–87. Manchester: St. Jerome.
González Davies, M. 2004. Multiple Voices in the Translation Classroom. Amsterdam/Philadel-
phia: John Benjamins.
González Davies, M. (coord.). 2003. Secuencias. Tareas para el aprendizaje interactivo de la
traducción especializada. Barcelona: Octaedro-EUB.
Guzman, J. (forthcoming). “El uso de COVALT y AlfraCOVALT en el aprendizaje traductor”.
In Actas del XXIV Congreso Internacional de la Asociación Española de Lingüística Aplicada
(AESLA).
Guzman, J. and Serrano, A. 2006. “Alineamiento de frases y traducción: AlfraCOVALT y el
procesamiento de corpus”. Sendebar 17: 169–186.
Helbig, G. and Buscha, J. 1989 [12th ed.]. Deutsche Grammatik. Ein Handbuch für den Auslän-
derunterricht. Leipzig: VEB Verlag Enzyklopädie.
Hurtado, A. (dir.). 1999. Enseñar a traducir. Metodología en la formación de traductores e inté-

rpretes. Madrid: Edelsa.
Hurtado, A. 2001. Traducción y Traductología. Introducción a la Traductología. Madrid: Cá
tedra.
Kelly, D. 2005. A Handbook for Translator Trainers. Manchester: St. Jerome.
Kiraly, D. C. 2000. A Social Constructivist Approach to Translator Education. Manchester: St.
Jerome.
Kübler, N. 2003. “Corpora and LSP Translation”. In Corpora in Translator Education, F. Zanettin,
S. Bernardini and D. Stewart (eds.), 25–42. Manchester: St. Jerome.
Lawick, H. Van. 2006. “Adquisició de competències lingüístiques per a la traducció: una pro-
posta basada en el treball autònom amb corpus”. In Towards the Integration of ICT in Lan-
guage Learning and Teaching: Reflection and Experience, U. Oster and N. Ruiz (eds.). Cas-
telló de la Plana: Publicacions de la Universitat Jaume I (forthcoming).
Marco, J. 2006. “A Corpus-Based Approach to the Translation of Evaluative Adjectives as Mo-
dality Markers”. In Corpus Linguistics: Applications for the Study of English, A. M. Hornero,
M. J. Luzón and S. Murillo (eds.), 241–254. Bern: Peter Lang.
Marco, J. and Guzman, J. 2007: “A Corpus-based Analysis of Lexical Items Conveying Body
Language in the COVALT Corpus”. Belgian Journal of Linguistics 21: 155–170.
Osborne, J. 2000. “What can students learn from a corpus?: Building bridges between data and
explanation”. In Rethinking Language Pedagogy from a Corpus Perspective, L. Burnard and
T. McEnery (eds.), 165–172. Frankfurt: Peter Lang.
Partington, A. 1998. Patterns and Meanings. Using Corpora for English Language Research and
Teaching. Amsterdam/Philadelphia: John Benjamins.
PC-Bibliothek. Duden Deutsches Universalwörterbuch [cd-rom]. 2001 [4th ed.]. Mannheim:
Bibliographisches Institut and F.A. Brockhaus AG.
Pearson, J. 2003. “Using Parallel Texts in the Translator Training Environment”. In Corpora in
Translator Education, F. Zanettin, S. Bernardini and D. Stewart (eds.), 15–24. Manchester:
St. Jerome.
Pérez Saldanya, M. and Salvador, V. 1995. “Fraseologia de l’encara i processos de gramaticalit-
zació”. Caplletra 18: 85–108.
Pym, A. 1992. “Translation Error Analysis and the Interface with Language Teaching”. In
Teaching Translation and Interpreting. Training, Talent and Experience, C. Dollerup and
A. Loddegaard (eds.), 279–288. Amsterdam/Philadelphia: John Benjamins.
Roth, J. 1956/1975. Die Flucht ohne Ende. Ein Bericht. Amsterdam: Allert de Lange/Köln:
Kiepenheuer & Witsch.
Roth, J. 1995. La fugida sense fi. Alzira: Edicions Bromera [translated by Heike van Lawick].
Salvador, V. 2002. “Les construccions condicionals i les concessives”. In Gramàtica del català
contemporani, Vol. 3, J. Solà et al. (eds.), 2977–3025. Barcelona: Empúries.
Sweetser, E. E. 1990. From Etymology to Pragmatics: Metaphorical and Cultural Aspects of Se-
mantic Structure. Cambridge: Cambridge University Press.
Swift, G. 1996. Last Orders. London: Picador.
Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benja-
mins.
Varantola, K. 2003. “Translators and Disposable Corpora”. In Corpora in Translator Education,
F. Zanettin, S. Bernardini and D. Stewart (eds.), 55–70. Manchester: St. Jerome.
Appendix 1. Tasks with als and wenn for the German-Catalan

translation classroom
1. Identify as many meanings of als and wenn as you can in the following
excerpts (temporal, modal, or conditional, establishing their specific role
in each case):
1. Kleine Strohhütten dienen den Tieren als Unterkunft.

2. Es wird als unhöflich betrachtet, jemandem ins Gesicht zu sagen, was man denkt.
3. “Regen auf dem Dach ist schöner als Fernsehen”, sagte unsere Wirtin.
4. Gibt’s das wirklich, dass jemand 365 Tage im Jahr nichts als kulinarische Hochkultur er-
trägt?
5. In Japan gelten andere Regeln bezüglich der Gastfreundschaft als hierzulande.
6. Manche staatlichen Reaktionen auf die Terroranschläge hätten den Eindruck hinterlassen,
als würden die Grundsätze der Menschenrechte Maßnahmen zur Bekämpfung des Terroris-
mus geopfert, warnte Robinson.
7. Die Natur befand sich in ungewöhnlicher Spendierlaune, als die das Tessin entwarf.
8. Gerade ein paar Wochen lang war sie auf Sendung, als auf einmal der Verlag “Kobunsha” bei
ihr anfragte, ob sie nicht ein Buch schreiben wolle.
9. Oliver Kahn antwortet nur, wenn er gefragt wird.
10. Erst wenn die Gutscheine aufgebraucht sind, müssen sie zahlen.
11. Sie haben als Jugendlicher, wenn sie nachts nach einer Party noch Hunger hatten, nie eine
Bratwurst oder Fischsemmel gegessen?
12. Was würde man selber tun, wenn Deutschland besetzt würde?
13. Auch wenn nicht alle getesteten Kindersitze die Kriterien der ADAC-Tester erfüllen können:
Jedes Sitzsystem schützt ein Kind besser, als wenn es ungesichert im Auto mitfährt.
2. Fill in the gaps with an adequate translation into Catalan
Im Februar 1918 verlor Baranowicz den Al febrer de 1918, Baranowicz, __________

Daumen der linken Hand, als er unvorsichtig ____ ____________________________, va
Holz sägte. perdre el dit gros de la mà esquerra.
Irene nahm nach dem Krieg eine Stellung Després de la guerra, Irene acceptà un
in einem Büro an, weil man sich damals treball en una oficina, perquè llavors la
anfing zu schämen, wenn man nicht gent començà a avergonyir-se ___________
arbeitete. _________________________.
Als er seinen Namen nannte, wurde Tunda _____________ el seu nom, ______________
als Herr Kapellmeister behandelt. __________senyor director d’orquestra.
Sie war mutiger als die ganze männliche Era ____________ tota la colla masculina
Schar, in deren Mitte sie kämpfte. enmig de la qual lluitava.
Tunda aber sah sie, wenn er nahe vor ihr Però Tunda, ___________________________
stand, wie in einem blassen Spiegel. ________ la veia com en un espill deslluït.
In diesen Tagen begann Tunda, alle unbe- En aquests dies, Tunda començà a apuntar
deutenden Ereignisse niederzuschreiben, es tots els esdeveniments indiferents, ________
war, als bekämen sie dadurch eine gewisse ____________________________________
Bedeutung. significat.
Wahrscheinlich wäre ich sehr glücklich, Probablement em faria molt feliç _________
wenn sie geruhen würde, mir einen Auftrag ______________________ manar-me algun
zu geben. encàrrec.
Es war der erste Satz, den sie direkt an Era la primera frase que m’ha dirigit
mich gerichtet hatte, und sie sah mich nicht directament, sense mirar-me, ___________
an, als wollte sie zu erkennen geben, daß ____________________donar-me a entendre
sie, auch wenn sie zu mir sprach, nicht que, _______________ parlava, no ho feia ni
gerade unbedingt und nur zu mir sprach. incondicionalment, ni exclusivament a mi.
Ich kann ihr auch Geld geben, wenn Du Puc donar-li diners, ____________________
willst, daß sie zu Dir kommt. reunir-se amb tu.
Der Fremde gab an, daß er im Jahre 1916 El foraster va declarar que, l’any l9l6_______
als österreichischer Oberleutnant in ein _____________________________ el van dur
sibirisches Kriegsgefangenenlager gekom- a un camp de presoners siberià.
men war.
Appendix 2. A multiple choice task based on a corpus

of student translations
1. Choose the best translation for each of the following excerpts and
motivate your choice. Say whether the other options are correct /
incorrect and / or more or less adequate.
1. So we shuffle on, down some steps and up some steps, past all these geezers made of stone,
lying face up, flat out, out for the count.
a. Ens vam posar a trescar, escales amunt i avall, passant per aquells vells xarucs fets de pedra,
ajaguts d’esquena tan llargs com eren, a punt per passar revista.
b. Continuem caminant a desgana, baixem i pugem escales, passem per davant de tots aquests
individus fets de pedra, gitats panxa enlaire, fatigats, fets pols.
c. Així que vàrem continuar deambulant, uns passos amunt, uns passos avall. Vàrem passar
tots aquests homenots fets de pedra que estaven gitats cap amunt, tots tirats i dormint com
un tronc.
d. Així que, arrossegant els peus, continuem caminant esglaons amunt i esglaons avall per
davant de tots aquests vells xarucs de pedra que reposen boca per amunt, fora de combat.
e. Aleshores continuem caminant arrossegant els peus, amunt i avall unes quantes passes, per
davant de tots aquests homenots fets de pedra que estan gitats cara amunt, molt cansats,
fora per al compte.
2. I reckon that’s when it really happened, that’s when we really parted company, though it
wasn’t till later, till she teamed up with that Tyson toe-rag, then started taking on all-comers,
that I washed my hands altogether, did a Vic.
a. Supose que quan va succeir és quan, en realitat, vam dividir l’empresa, ja que no va ser fins
més tard, fins que ella es va associar amb aquell fanfarró d’en Tyson, quan va començar a
veure-se-les amb tots els adversaris, que em vaig llavar les mans del tot igual que Vic.
b. Crec que va ser llavors quan va passar, quan realment ens vam separar, encara que no va
ser fins un poc més tard, fins que es va associar amb aquell pocavergonya de Tyson i van
començar a prendre part tots els contendents, que em vaig desentendre, vaig fer com Vic.
c. Crec que va ser aleshores quan va passar, quan ens vam separar de veritat, encara que no
va ser fins temps després, quan ella es va ajuntar amb aquell mamarratxo musculat al més
pur estil Tyson, quan començaren a esdevenir totes les adversitats del món. Però jo no vaig
rentar-me les mans, com va fer Vic.
d. Crec que tot ve d’aleshores, sí, va ser aleshores que ens vam distanciar de debò, tot i que no
va ser fins més endavant, quan es va ajuntar amb el poca-vergonya i després va començar
a ajuntar-se amb el primer que arribava, que vaig rentar-me’n les mans de tot plegat, mans
netes com Vic.
Appendix 3. Tasks with now for the English-Catalan translation classroom
1. Identify as many meanings of now as you can in the following excerpts:

CJ2 332 A barely significant political force throughout the two-and-a-half years since its foun-
dation, the Falange now found its numbers swelling dramatically with disillusioned members
of the CEDA Youth.
CE5 25 Get up. Now!
CD9 903 We were glad he was getting out. Now, while the going was good.
CK6 2138 Like their record label’s slogan, Scissormen are “Independent, NOT indie”, and are de-
serving of your time. Now, can we please have a few more bands that intend to be this good?
H7U 1417 One other provision, the Control of Misleading Advertisements Regulations 1988,
will be considered in the present chapter. Now, the fact that a criminal offence is committed by
a trader or creditor does not confer upon the consumer any right to bring an action or obtain
redress in the civil courts.
H9V 1154 Took me ages to track her down. Now,” he continued, giving Hilary a warm smile, “it
was paint you wanted, wasn’t it?
JY8 4389 “End of lecture. Now, what is it you would like to do if I can’t persuade you to keep an
old man company?”
KD1 4786 Now, now, we’ve still got a lot to do.
2. Translate the passages in the previous task.

3. Translate the following extended passage, paying special attention to the role played by the
now in developing the argument:
‘Show us the tap, and give us a bit of cold meat and a drop of beer while yer inquiring, will yer?’
said Noah.
Barney complied by ushering them into a small back-room, and setting the required viands be-
fore them; having done which, he informed the travellers that they could be lodged that night,
and left the amiable couple to their refreshment.
Now, this back-room was immediately behind the bar, and some steps lower, so that any person
connected with the house, undrawing a small curtain which concealed a single pane of glass
fixed in the wall of the last-named apartment, about five feet from its flooring, could not only
look down upon any guests in the back-room without any great hazard of being observed (the
glass being in a dark angle of the wall, between which and a large upright beam the observer
had to thrust himself), but could, by applying his ear to the partition, ascertain with tolerable
distinctness, their subject of conversation. The landlord of the house had not withdrawn his eye
from this place of espial for five minutes, and Barney had only just returned from making the
communication above related, when Fagin, in the course of his evening’s business, came into
the bar to inquire after some of his young pupils.
Appendix 4. References to text in COVALT corpus
Conrad, J. 1990. Typhoon, and Other Stories. London: Penguin.

Conrad, J. 1991. Tifó. Alzira: Bromera [translated by Remei Bataller].
De Quincey, T. 1971 Confessions of an English Opium-Eater. London: Penguin.
De Quincey, T. 1995 Confessions d’un opiòman anglés. Alzira: Bromera [translated by Enric
Sòria Parra].
Doyle, A. C. 1981. The Penguin Complete Sherlock Holmes. London: Penguin.
Doyle, A. C. 1992. Sherlock Holmes i els plànols del Bruce Partington. Alzira: Bromera [trans-
lated by Víctor Oroval Martí].
London, J. 1990. The Call of the Wild, White Fang, and Other Stories. Oxford: Oxford University
Press.
London, J. 1995a. El creuer del Dazzler. Alzira: Bromera [translated by Remei Bataller].
London, J. 1995b. La crida del bosc. València: Germania [translated by Tomasa Plata Vinuesa].
London, J. 1996. Clau blanc. Alzira: Bromera [translated by Josep Franco Martínez].
London, J. 2002. The Cruise of the Dazzler. Amsterdam: Fredonia Books.
Safeguarding
the lexicogrammatical environment
Translating semantic prosody
Dominic Stewart
University of Macerata, Italy
This paper discusses a module taught to final year students at the School for
Interpreters and Translators at Forlì, University of Bologna, examining the role
of semantic prosody in translation from English to Italian and the way in which
we as corpus analysts use our intuitions about language to seek insights into se-
mantic prosody and to convert corpus data into evidence of semantic prosody.
These issues are considered primarily from the point of view of a teacher of
translation wishing to sensitise students to the opportunities afforded by cor-
pora for translators.
Key words: Translation teaching, semantic prosody, intuition, empirical data,

lexicogrammatical environment.
Introduction
These days, translation theorists advise us to go holistic. Translators should be

culture-aware, function-aware, register-aware, frequency-aware, ever alert to
context and purpose, to co-text, to source language and target language conven-
tions, requirements and restraints. This is already a lot to think about.
Perhaps less emphasised, or at least only implicitly, is the notion that the be-
leaguered translator should be aware of a word’s habitual lexicogrammatical en-
vironment. If we wish to translate the sentence ‘She sat through the opera’, we
should ideally be aware that the expression SIT through (something) is associated
with a prosody of boredom or discomfort (see Hunston 2002: 60–62), so that, if
appropriate, we can try to render the prosody in the target text. In this respect
semantic prosody must be seen as a reality that translators are required to address,
otherwise important source text elements will be left unaccounted for.
30 Dominic Stewart
Yet, as Louw (1993) has pointed out, prosodies may be more subtle than this,
possessing an almost subliminal quality which may not be readily perceived by
the hearer/reader/translator. In this respect, corpora have been considered cru-
cial, enabling or helping us to identify, through corpus data, the lexical profile
of any given term. Since semantic prosody has been studied almost exclusively
within the domain of corpus linguistics, it seems legitimate in this context to raise
the question of just how corpora can provide translators with insights into se-
mantic prosody. Yet there is little research, within translation studies or corpus
linguistics, into how we as analysts actually read or interpret corpus data, i.e., how
we convert corpus data into evidence. In this regard, of considerable importance
is the role of the user’s intuitions in corpus investigations, particularly as the role
of intuition tends to be played down in studies on corpus linguistics, often consid-
ered speculative and untrustworthy by comparison with the tangible, empirical
world of computerised collections of data.
Two main issues thus emerge: the role of semantic prosody in the translation
process and how we intuitively convert corpus data into evidence of semantic
prosody. I shall consider these issues primarily from the point of view of a teacher
of translation wishing to sensitise students to the opportunities afforded by cor-
pora for translators.
The teaching in question, a module taught to final (fourth) year Italian stu-
dents at the School for Interpreters and Translators at Forlì, University of Bolo-
gna, involved corpus analysis of certain phrases and expressions drawn from a
passage by James Joyce, in order to identify possible semantic prosodies.
Section 1 of the present article furnishes a brief review of studies on semantic
prosody in both corpus linguistics and translation, while Section 2 outlines the
methodology adopted and gives the findings. Section 3 offers a discussion of the
methodology used, as well as critical reflections upon the way the corpus analysis
was carried out.
1. Studies on semantic prosody
1.1 Studies on semantic prosody in corpus linguistics
Over the last twenty years or so semantic prosody has aroused considerable atten-
tion within corpus linguistics. Interest in the subject was initially kindled in the
late 1980s by Sinclair’s observations about the lexico-grammatical environment
of the phrasal verb SET in, later reiterated in Sinclair (1991: 74). Using a corpus of
around 7.3 million words, the author makes the following observation about this
verb’s grammatical subjects:
Safeguarding the lexicogrammatical environment 31
The most striking feature of this phrasal verb is the nature of its subjects. In gen-
eral, they refer to unpleasant states of affairs … The main vocabulary is rot, decay,
malaise, despair, ill-will, decadence, impoverishment, infection, prejudice, vicious
(circle), rigor mortis, numbness, bitterness, mannerism, anticlimax, anarchy, dis-
illusion, disillusionment, slump. Not one of these is conventionally desirable or
attractive (ibid: 74–75).
Later in the same work the author (ibid: 112) notes, within the framework of his
idiom principle, that “Many uses of words and phrases show a tendency to occur
in a certain semantic environment. For example the word happen is associated
with unpleasant things – accidents and the like”.
Sinclair’s reading of semantic prosody is to be understood within his model
of the extended lexical unit, which integrates collocation, colligation, semantic
preference and semantic prosody. For example, in Sinclair 1996 (84–91) the au-
thor analyses the lexical items (a) the naked eye, for which he posits a prosody of
‘difficulty’ on account of its frequent co-occurrence with sequences such as barely
visible to the, too faint to be seen with, invisible to, and (b) true feelings, for which
he claims a prosody of ‘reluctance’, i.e., reluctance to express our true feelings, on
account of co-occurrences such as will never reveal, prevents me from expressing,
less open about showing, guilty about expressing.
The pragmatic implications of semantic prosody are made explicit in the fol-
lowing:
A semantic prosody…is attitudinal, and on the pragmatic side of the semantics /
pragmatics continuum. It is thus capable of a wide range of realisation, because
in pragmatic expressions the normal semantic values of the words are not neces-
sarily relevant. But once noticed among the variety of expression, it is immedi-
ately clear that the semantic prosody has a leading role to play in the integration
of an item with its surroundings. It expresses something close to the ‘function’
of an item – it shows how the rest of the item is to be interpreted functionally.
(Sinclair 1996: 87–88)
The term ‘semantic prosody’ itself first gained currency in Louw (1993), and was
based upon a parallel with Firth’s discussions of prosody in phonological terms.
In this respect Firth was concerned with the way sounds transcend segmental
boundaries. The exact realisation of the phoneme /k/, for example, is dependent
upon the sounds adjacent to it. The /k/ of cat is not the same as the /k/ of key,
because during the realisation of the consonant the mouth is already making pro-
vision for the production of the next sound. Thus the /k/ of cat prepares for the
production of /æ/ rather than /i:/ or any other sound, by a process of “phonologi-
cal colouring” (Louw ibid: 158). In the same way, it has been claimed, an expres-
sion such as symptomatic of (ibid: 170) prepares (the hearer / reader) for what
32 Dominic Stewart
follows, in this case something undesirable (co-occurrences of symptomatic of in

the corpus used by Louw include parental paralysis, management inadequacies,
numerous disorders).
Phonemes are influenced by the sounds which precede them as well as those
which follow, and therefore the semantic analogy extends not only to words that
appear after the keyword, but more generally to the keyword’s close surrounds.
According to Louw (ibid: 159), “the habitual collocates of the form set in are ca-
pable of colouring it, so it can no longer be seen in isolation from its semantic
prosody, which is established through the semantic consistency of its subjects”.
Hence Louw’s (ibid: 157) definition of semantic prosody as a “consistent aura
of meaning with which a form is imbued by its collocates”, with its implications
of a transfer of meaning to a given lexical item from its habitual co-text. His ex-
amples of lexical items with prosodies include utterly, bent on and symptomatic of,
for all of which he claims negative prosodies.
The concept of semantic prosody is a contentious one – see Hunston (2007)
and Stewart (forthcoming) for a summary of the particular bones of contention.
Important contributions to the subject have also been made by Stubbs (1996,
2001), Partington (1998, 2004), Tognini-Bonelli (2001), Hunston (2002), Whitsitt
(2005).
1.2 Studies on semantic prosody in translation
Over the last few years there have been a number of studies on semantic prosody
within a contrastive framework. These include Xiao and McEnery’s (2006) com-
parison of prosodies of near-synonyms across English and Chinese, and Berber-
Sardinha’s (2000) analysis of English and Portuguese, both of which conclude that
collocational behaviour and semantic prosodies of near-synonyms are unpredict-
able across the two language pairs, in some cases being quite similar and in others
quite different. What emerges clearly from the two studies, however, is that such
phenomena should receive far more attention in pedagogy (language teaching,
translation teaching, dictionary compilation) than is currently the case. Similar-
ly, in Munday’s (forthcoming) cross-linguistic analysis of semantic prosodies in
comparable reference corpora of English and Spanish, the author advocates more
earnest collaboration between translation studies theorists, monolingual corpus
linguists and software developers. He also makes the important point that corpus
data are particularly useful to translators (in this case, to translators working into
their mother tongue) because:
the translator may be aware of the general semantic prosody of target text al-
ternatives (since these are in his/her native language) even if he/she may be less
sensitive to subtle prosodic distinctions in the foreign source language.
Tognini-Bonelli (2001: 113–128, 2002) uses corpus data to compare semantic

prosody within analogous units of meaning in English and Italian, while Parting-
ton (1998: 48–64) claims that perfect equivalents across English and Italian are
few and far between because even words and expressions which are ‘lookalikes’
or false friends (e.g., to sanction vs. the Italian sanzionare, correct vs. the Italian
corretto) may have very different lexical environments.
2. The teaching module: Methodology and results
The module was taught to a class of 25 final-year students, all of whom were Ital-
ian. It consisted principally of the following:
– two lessons were devoted to textual analysis and discussion of a passage from
James Joyce’s story The Dead
– students were given a week to translate the passage into Italian
– following submission of their translations, a series of lessons were given on
semantic prosody and its possible relevance to the passage from Joyce, focus-
ing particularly, with the aid of corpus data, on three sentences or parts of
sentences from that passage
– students were then asked to re-translate the three sentences in the light of our
discussions of semantic prosody
– a comparison was then made of the translations ‘before and after’ the discus-
sions of semantic prosody, to see if awareness of prosodies had affected the
students’ translations in any way.
The passage analysed is the final part of Joyce’s The Dead, the last story in the
collection Dubliners. It is an extraordinarily lyrical, mournful, allusive passage,
heavy with symbolism, and this is in part why I chose it: aside from the beauty of
its language, its allusive nature seemed to provide a suitable springboard for the
analysis of semantic prosody. Further, the students were already studying Dublin-
ers as part of a literature course.
In the passage in question the character Gabriel is sitting in a dark hotel room
late at night after a party with family and friends in Dublin. His wife is asleep on
the bed while he reflects upon the past and present, and more specifically upon
the fact that many years before, his wife, as she has just revealed to him, had loved
34 Dominic Stewart
another man before she met Gabriel. The sentences analysed in detail in the class-
room are in bold:
The air of the room chilled his shoulders. He stretched himself cautiously along
the sheets and lay down beside his wife. One by one they were all becoming shades.
Better pass boldly into the other world, in the full glory of some passion, than fade
and wither dismally with age. He thought of how she who lay beside him had locked
in her heart for so many years that image of her lover’s eyes when he had told her
that he did not wish to live.
Generous tears filled Gabriel’s eyes. He had never felt like that himself towards
any woman, but he knew that such a feeling must be love. The tears gathered more
thickly in his eyes and in the partial darkness he imagined he saw the form of a
young man standing under a dripping tree. Other forms were near. His soul had
approached that region where dwell the vast hosts of the dead. He was conscious of,
but could not apprehend, their wayward and flickering existence. His own identity
was fading out into a grey impalpable world: the solid world itself which these dead
had one time reared and lived in was dissolving and dwindling.
A few light taps upon the pane made him turn to the window. It had begun to
snow again. He watched sleepily the flakes, silver and dark, falling obliquely against
the lamplight. The time had come for him to set out on his journey westward.
Once the students had submitted their initial translations, I introduced them to
the notion of semantic prosody, with the assistance of data from the British Na-
tional Corpus, using the three highlighted sentences / phrases. The methodology
I used was as follows:
i. The air of the room chilled his shoulders
I selected this sentence because it seemed to me to represent a switch of mood

within the text – the introduction or the signposting of a heavier, more sombre
atmosphere. At first glance it is beguilingly simple: on a syntactic level it consists
of the canonical SVO, and from a semantic point of view it seems equally uncom-
plicated: it is mid-winter, it is late at night, and not surprisingly Gabriel, who is
lost in thought and sitting immobile on the bed, begins to feel the cold. Yet it is
a sentence which puts the reader on the alert, conveying a mysterious sense of
foreboding.
I originally envisaged that this sense of foreboding was, at least in part, a re-
sult of some of the habitual co-occurrences of CHILL, whether as noun or verb.
However, since respective searches for the noun and the verb retrieved a lot of oc-
currences (705 as noun, 281 as verb), I immediately refined the search to make it
more similar to the sentence under analysis: the verb CHILL followed by any one
of the possessive adjectives my / your / his / her / its / our / their within a span of 5.
This produced a more manageable selection of 46 occurrences, though it included

cases of her as direct object of the verb. Relatively few of the occurrences carry the
physical meaning of ‘make cold’, for example:
It was bitterly cold. The air chilled his lungs. The ground was
washing in from off the water chilled his face, freshened him.
The great majority appear to convey the idea of fear, for instance:
the lack of warmth in the smile chilled her. ‘I’ve given much
The grim hostility in his eyes chilled her. ‘OK, I’ll explain.
with a sinking feeling that chilled her more than any explosion
as something in his tone that chilled her even more. It was the
In many of these (see Concordance 1) the element of fear is made explicit by

co-occurrences such as blood and marrow in particular, but is also suggested by
elements such as bone, heart, mind and soul.
His threat was enough to chill her blood. ‘Poor Paige’. It became
minister’s mind and preferably chill his blood. It is possible to
and saw something that chilled his blood. Like some animated corpse
It took them unaware, chilling their blood. The reply, a weird
correctly Rachel Mortimer, chilled my soul to the marrow.) The
was so damned determined it chilled her to the very marrow of her bones
rope, the garrotte – they don’t chill my heart. Poison, however, is
Are you here? The thought chilled his mind – ‘Are you a ghost?’
aggressive determination which chilled her to the bone. ‘Take care’,
over like some succulent titbit chilled her to the bone. Moving on
doing that?’ The cool threat chilled her to the bone. She swallowed
around her like an icy fist, chilling her to the bone. There had to be
Concordance 1. ‘chill=VERB followed by my / your / his / her / its / our / their’. Span 5.
Non-random selection of 12/46
In consideration of these co-occurrences, the data may be considered to suggest

that although ‘the air of the room chilled his shoulders’ is not obviously meta-
phorical, the common metaphorical use of CHILL remains in the background,
giving rise to a prosody of fear or terror, or even death. In their first translations
many of the students had not appreciated the importance of the metaphorical
element, rendering the sentence with expressions representing primarily the sen-
sation of cold:
36 Dominic Stewart
L’aria della stanza gli raffreddò le spalle

L’aria della stanza gli ghiacciò le spalle
L’aria della stanza gli infreddoliva le spalle
L’aria della stanza rinfrescò le sue spalle
The retranslations of the sentence, however, focused more earnestly on expres-

sions which suggested not only cold but also fear:
L’aria della stanza gli gelò/gelava le spalle
L’aria della stanza lo fece rabbrividire
L’aria della stanza gli raggelò le spalle
The verbs gelare, raggelare and rabbrividire frequently occur in expressions of fear.
My students suggested parallel examples such as mi ha gelato il sangue ‘it froze my
blood’; mi sono sentito gelare il sangue ‘I felt my blood freeze’; si sono sentiti rag-
gelare il sangue nelle vene ‘they felt their blood freeze in their veins’.
ii. Other forms were near
I selected this sentence for analysis because once again I had the impression that
despite its ostensible simplicity on both a semantic and syntactic level, there was
something sinister, almost threatening about it, yet I was unable to identify with
any degree of precision what caused this impression. Initially I made a number of
apparently fruitless searches, all of them variations upon ‘other forms were near’.
These were (=SUBST means ‘all forms of the noun’; =VERB means ‘all forms of
the verb’; an asterisk means ‘followed by’):
‘other forms’
‘other form=SUBST’
‘form=SUBST were’
‘form=SUBST * be=VERB’ (span 5)
‘were near’
‘form=SUBST * near’ (span 5)
‘were near.’
‘be=VERB * near’ (span 5)
This went on for some time until I hit upon the search ‘be=VERB * near.’ (span 5),
i.e., any form of be followed by near followed immediately by a full stop within a
span of 5 words. This resulted in 74 hits, of which around 20 looked interesting for
the type of investigation I was conducting (see Concordance 2).
everyday life for the Lord is always near. The parables of Jesus promise
gather now that the last days are near.’ He looked directly at Morrsleib
plan is dead and the end may be near. ‘We don’t have a chance’, said
coming of Christ the time of Christ is near. It means that it’ll come sort
than you hate me. My own death is near. I shall leave this ship and go
an eagle loses hope then death is near.’ Her voice faded as her body
brochure Proclaiming that the end is near. Black diplomats with stately
be found. Call upon him while he is near. Let the wicked forsake his way
in God’s presence. Our salvation is near. As we wait for Christ to be
Messiah and that the end of the world is near. Federal agents bathed the cult
Messiah and that the end of the world is near. The cult has been barricaded in
and that to see if anyone was lurking near. ‘Ambush – you know, surprise
the main post. The bombers were very near. A crash in the direction of the
too, knew that their last hour was near. Many of them were the scourings
ended up with a newspaper. The end was near. It came in March with the
new organs, Kelly realised her time was near. ‘I was driving Kelly to
dangerous of all possible enemies, man, was near. After ten minutes on
the King that the Day of Judgement was near. It was only after the
who, at 70, knew that his end was near. Mr Gorbachev, you feel, is only
mad certainty that a day of reckoning was near. And underneath these
Concordance 2. ‘be=VERB followed by near followed immediately by a full stop’.

Span 5. Non-random selection of 20/74.
It will be noticed that many of the grammatical subjects of BE refer to the end of
something, for instance the end of someone’s life, the end of the world, the end
of time. Left-hand co-occurrences include death, storm, dangerous, hate, lurking,
enemies, loses hope. More generally there is an archaic, biblical quality to the oc-
currences listed, with references to Christ, Jesus, salvation, the coming of Christ, let
the wicked forsake, the Day of Reckoning, the Day of Judgement, impending doom.
However, occurrences of this type represent only around a third of the total, so that
if one wished to suggest a prosody of doom or something similar, it would have to
be acknowledged that the prosody is not especially strong. At the same time, the
prosody connects powerfully with the lugubrious atmosphere not only of the final
pages of The Dead but also with the atmosphere of the story as a whole.
The other BNC searches carried out (listed above) in order to investigate
‘other forms were near’, though similar, produced quite different concordances.
The presence of the full stop after near, for example, was crucial, since without it
there was a high percentage of more banal occurrences such as ‘…near the library’,
‘…near the school’, or ‘near’ with a place name. Further, the ‘sinister’ cases tend
38 Dominic Stewart
to occur when BE is followed directly by near, i.e., when near is not qualified by,
for example, very, quite, reasonably, too etc., since the presence of a qualifier again
tends to produce more banal occurrences such as:
but most of the time, she’s fairly near. Efforts rewarded She will
weeks by then you Yeah yeah. It’s too near. So the thirtieth of September
There are exceptions to this, however, for instance:
of everyday life for the Lord is always near. The parables of Jesus promise
The students’ initial translations provided a range of solutions, most of which

prioritised the element of physical proximity:
C’erano altre forme vicino
Vicino c’erano altre forme
Altre forme erano lì vicino
Altre forme erano presenti
Altre figure erano nei paraggi
In their retranslations the group tried to account for the prosody discussed, fa-
vouring primarily
Altre forme/figure/sagome erano vicine,
in part because according to the students the syntax of this solution, with vicine
functioning as an adjective at the end of the sentence, recalls expressions such as
la morte è/era vicina ‘death is/was near’.
Some students tried to reproduce the archaic quality of the sentence by re-
placing vicino with accanto, which also means ‘near’:
Altre forme gli erano accanto
Accanto a lui vi erano altre forme
However, it was acknowledged that accanto was not ideal in that it appears to be
associated with pleasant rather than unpleasant states of affairs.
iii. The time had come (for him to set out on his journey westward)
I selected this phrase for analysis because it reminded me of expressions such as

‘my time has come’, ‘his time had come’, which I had introspectively considered to
refer to circumstances of impending death, as in ‘I thought my last moment had
come’. I searched the BNC using the simple queries ‘time has come’ and ‘time had
come’, confidently expecting that my introspection would be supported by em-

pirical data. Yet this was not the case at all. As far as I could make out, suggestions
of death were extremely few and far between, for example:
I thought me time had come to be quite honest. I mean, how frightened were
you
The BNC data, 173 hits for ‘time has come’ and 91 hits for ‘time had come’ suggest
that the two expressions are associated with change, with personal resolve, with
a new beginning, but not noticeably with death or destiny. Having presented the
data to the class and discussed it in some detail, I then informed the students that
notwithstanding the corpus data, I remained convinced that the string ‘the time
had come’ in the passage examined is unsettling and ominous, in part because it
links up with the death-ridden implications of the “journey westward” (towards
the setting sun) and indeed of The Dead as a whole.
Many of the initial translations had included:
era arrivato il momento/il tempo…
era venuto il tempo/momento…
era arrivata l’ora…
In the retranslations there was a marked tendency to try to incorporate the sup-
posedly sinister associations of ‘the time had come’, in particular via the use of the
verb giungere, which was considered to convey a more archaic, biblical quality:
era giunto il momento/il tempo
era giunta l’ora
For reasons of time the module did not include work with corpora of Italian,
which might well have proved useful as a check on the students’ introspective
considerations about their native language, but the principal objective was to cre-
ate awareness of the potential usefulness of corpus data in their foreign language,
where their introspections would be less reliable.
3. Discussion of the methodology adopted
3.1 Intuition
The final lesson of the module was devoted to a critical discussion of the corpus in-
vestigations conducted, i.e., how revealing the searches were in identifying seman-
tic prosody, and to what degree they helped to fill intuitive gaps in our knowledge
40 Dominic Stewart
of language. These are important questions because in corpus studies, above all in
the context of semantic prosody, intuition gets a bad press, often being described as
‘unreliable’, ‘inaccurate’, ‘chancy and unreliable’, ‘notoriously thin’ and ‘a poor guide’
(to semantic prosody). For example, Xiao and McEnery (2006: 103) begin their
study of semantic prosody in English and Chinese with the following premise:
We knew that our approach should be corpus-based as previous studies have
shown that a speaker’s intuition is usually an unreliable guide to patterns of col-
location and that intuition is an even poorer guide to semantic prosody.
Channell (2000: 39) aims to demonstrate that “analysis of evaluation can be re-

moved from the chancy and unreliable business of linguistic intuitions and based
in systematic observation of naturally occurring data”, since corpus-based analy-
ses can reveal evaluative functions “which intuitions fail to pick up”.
According to Louw (1993: 173):
It may well turn out to be the case that semantic prosodies are less accessible
through human intuition than most other phenomena to do with language …
corpus linguistics reveals a greater and greater mismatch between the products of
introspection about language and those of direct observation.
In the remainder of this paper I shall consider the role of intuition in corpus
investigations with reference to the searches conducted above. Although my
students did not spontaneously dispute the way I had extracted and presented
the data – indeed they appeared to accept the validity of the corpus findings un-
questioningly – it could be argued that the searches entailed some highly suspect
empirical methodology.
3.2 Summary of the investigations carried out
The following is a schematic summary of the approach I adopted in order to in-

vestigate each sentence
i. The air of the room chilled his shoulders

– personal insight (i.e., CHILL is associated with a prosody of fear)
– search for CHILL as verb/noun
. In studies on corpus linguistics, intuition and introspection often appear to be regarded as

one and the same thing, i.e., the ability to reflect upon and make judgements about language
unassisted by further data. For reasons of space I shall not question this in the present context,
though elsewhere (Stewart, forthcoming) I have argued that it would be helpful – as far as pos-
sible – to keep intuition and introspection separate.
– more specific search for CHILL as verb followed by possessives

– initial insight apparently supported by data
– end of investigation
ii. Other forms were near
– indefinite ‘hunch’
– no support from initial searches
– numerous further searches
– hunch finally supported by specific search
iii. The time had come (for him to set out on his journey westward)
– personal insight
– barely any support from searches
– corpus data overruled by personal insight
3.3 Implications: Geared searches?
3.3.1 Beginning a search

There is much talk in corpus linguistics about skewed data attendant upon unbal-
anced or unrepresentative corpora, but in the above investigations it might be
argued that it is not the corpus builder who skews the data, but the corpus analyst,
i.e., analysts consciously or unconsciously ‘gear’ searches in such a way as to find
what they seek.
In a sense this is unavoidable. We do not open a corpus and start reading it
as we might our daily newspaper. There has to be an initial proactive move, a first
search which activates the visualisation of text, which gets the ball rolling. And
to do this, the analyst has to select a specific search word or words, something
which is already potentially destabilising inasmuch as it carries a risk of bias. The
very fact of making a search in the first place is almost tantamount to making a
statement about what we expect or hope to find. This is because we do not make
searches out of the blue – we test an idea, an expectation, a hunch. For example,
in order to investigate the sentence ‘The air of the room chilled his shoulders’, I
searched for the verbal lexeme CHILL followed by forms of the possessive adjec-
tive. But, why did I choose this? I could have made any number of searches, e.g.:
– ‘chilled his shoulders.’

– ‘chilled his’
– ‘his shoulders’
– ‘his shoulders.’
42 Dominic Stewart
– ‘chilled’
– ‘shoulders’
– ‘shoulders.’
– ‘chill=VERB
– ‘chill=VERB * his shoulders’ (For the queries including an asterisk (‘followed
by’) I am assuming a span of something like 4–5.)
– ‘chill=VERB * his’
– ‘CHILL=VERB * his/her/your [etc.] shoulders’
– ‘chill=VERB * his/her/your etc.’
– ‘chill=VERB * his/her/your etc. shoulder=NOUN’
– ‘chill=VERB * his/her/your etc. shoulder=NOUN.’
These are only the ‘one way’ queries, but any number of two-way queries would
also have been possible. Further, I investigated CHILL only as a verb: was I right
to exclude the noun (e.g., catch a chill) and the adjective (a chill wind)? And I took
no account whatsoever of the opening of the sentence – ‘the air of the room’. The
same goes for my searches ‘time has come’ and ‘time had come’. Why did I restrict
myself to the perfect and past perfect tenses? Why did I not extend the investiga-
tion to ‘my time came’, ‘my time comes’, ‘my time would come’, ‘my time would
have come’, or to progressive forms such as ‘my time is coming’, ‘my time has been
coming’, ‘my time was coming’ etc.? And why did I not look at negative and inter-
rogative forms? One must assume that on an intuitive level I considered these less
likely to produce interesting results.
It could thus be claimed that my searches were massively influenced by my
personal insights and geared from the outset.
3.3.2 Continuing a search

If we use a corpus to test a hypothesis, we will naturally prioritise data which seem
relevant to that hypothesis, and ignore data which do not. In other words, we rap-
idly convert the relevant-looking data into evidence of something, perhaps to the
exclusion of other albeit significant data. Looking for a familiar face, any familiar
face, in a crowd is very different from simply looking at a crowd. If I am looking
for a familiar face in a crowded room, I will probably only glance at the other faces
long enough to establish that they do not appear to be of immediate relevance
to me. And if I do not find a face I may go and look in another crowded room,
and another, until I find one. Of course there may be a number of familiar faces
in those crowded rooms whose relevance to me I fail to recognise because they
are not the faces I want to see at that particular time. This was, mutatis mutandis,
the strategy I adopted in connection with ‘other forms were near’. I pursued my
investigations until I ran across something which looked familiar or relevant to
me, and systematically eliminated anything which did not. Such decisions were
clearly based upon my intuitive reactions to the data.
3.3.3 Ending a search

My investigations into ‘The air of the room chilled his shoulders,’ and ‘Other
forms were near’ ended fairly abruptly once I had found data which I considered
to support my personal insights. Whether this has any theoretical justification is
a moot point. Have we the right, or better, is it empirically sound to terminate
an investigation simply because we have found something which supports our
intuitive reactions? There may in reality be a good deal more data waiting to be
discovered via further searches.
Of course there is also the other side of the coin – how empirically sound is it
to terminate a search when we do not find something which supports our intui-
tive reactions, as with ‘the time had come’ investigation above?
3.3.4 Drawing conclusions from the search

In the first two investigations above, I accepted the corpus data once it had more
or less supported my intuitions. In the third search (‘the time had come’) I over-
ruled the data when it did not support my intuitions.
Louw (1993: 173), when discussing the way we use corpora as evidence for se-
mantic prosody, called for “constant exposure, of the most humbling kind, to real
examples”. Certainly ‘humbling’ or ‘humble’ are not words that spring to mind
when I think of the way I conducted my searches and the conclusions I drew from
them. ‘Arrogant’ would perhaps be more appropriate.
4. Summing up: Corpus use and learning to translate
Naturally I would not wish to suggest that all corpus analysts adopt the search
methodologies outlined above, but the investigations conducted raise some
points which are by definition germane to CULT conferences, i.e., to corpus use,
pedagogy and translation:
– corpus users should perhaps be wary of overplaying the empirical card and
relegating intuition to the wings of corpus investigations. It could quite jus-
tifiably be argued that almost all corpus investigations, from beginning to
end, are heavily reliant upon the user’s intuitions about language, determin-
ing why we decide to consult a corpus in the first place, how we formulate
our search, how we react to the data, the criteria we use to select or ‘reverse
select’ (eliminate) concordance lines, how far we take the investigation, why
44 Dominic Stewart
we terminate the investigation, and how we draw conclusions from the data.
Indeed, without users’ intuitions, one imagines that most corpus searches
would be relatively unproductive, if they managed to get off the ground at all.
Much has been made of the notion that semantic prosody is ‘invisible to the
naked eye’, covert, subliminal etc., but presumably there must be some sort of
intuitive trigger which activates the corpus search in the first place and which
determines how we handle the data. It is thus surprising that intuition is often
stigmatised as unreliable in corpus studies. For further discussion see Stewart
(forthcoming).
– teachers should be careful about how corpus data are presented to students. I
would underline once again that although I informed my (final-year) students
that I considered my own methodology to be questionable, and although I
urged my students to make criticisms of it, not one of them independently
came up with any cogent reason to dispute the validity of the procedures ad-
opted. It may be that generally speaking students continue to labour under
the delusion that the teacher is always right, but perhaps more insidious in
the current context is the delusion that the contents of a corpus, since they
are ‘real’, are always ‘right’. Students are likely to be so blinded by the extraor-
dinary abundance and availability of the data that it may not occur to them to
question the strategies used to reach and to interpret those data.
– learners, since by definition they have fewer intuitions about the foreign lan-
guage than native speakers of that language, are more likely to approach the
foreign-language data with an open mind, with a clean slate, so to speak, and
may thus have a better chance of letting the corpus data speak for themselves.
At the same time, if one takes the view that corpus investigations are inex-
tricably bound up with intuitive user behaviour, then it could be argued that
the fewer intuitions one has, the more problematic it is to make productive
searches. It may not occur to the learner, or to the non-native speaker in gen-
eral, that there could be anything of prosodic interest in expressions such as
‘chilled his shoulders’, ‘other forms were near’ and ‘the time had come’. If this
is the case, the learner will either not explore the prosody at all, or make blind
searches which could prove fruitless and frustrating.
– translators may find themselves in a quandary. If corpus investigations can
help translators, are we actually willing to be helped by them, or will we do
no more than select the data that best suit or confirm our own perceptions /
preconceptions? (see Tymoczko 1998: 657–658). Further, there is the risk that
corpus evidence of something as mercurial as semantic prosody may actually
dishearten translators, serving only to lend weight to the notion of a supposed
impossibility of translation. Take the prosody of ‘doom’ associated with BE *
near at the end of a sentence. We have only just taken on board the notion that
different forms of a single lexeme will have different lexicogrammatical envi-

ronments (see in particular Sinclair 1991), but words with differing prosodies
according to their position in the phrase or sentence? The need to condense
all this information in the target language might be enough to make some
translators throw in the towel. Further, since all words have characteristic en-
vironments, it follows that all words are potentially associated with particular
prosodies. Indeed it was this that most concerned my group of students. The
concern was not so much that the prosodies of doom/fear/death could not be
adequately rendered in Italian, but that the type of language used by an author
such as Joyce is likely to be so teeming with prosodies, whether hidden or
not, that discovering and doing justice to all of these would be a mighty and
ultimately implausible task.
I am beginning to feel that with this paper I am conveying more prosodies of

doom than Joyce himself, but my wish is simply to advocate careful and critical
discussion of the themes introduced, or rather, to countenance a cult of caution
in CULT.
References
Berber-Sardinha, T. 2000. “Semantic prosodies in English and Portuguese: A contrastive study.”

In Cuadernos de Filologìa Inglesa (Universidad de Murcia, Spain) 9(1): 93–110. Availa-
ble online at: http://216.239.59.104/search?q=cache:Fy2Dx78mDJQJ:lael.pucsp.br/~tony/
2000murcia_prosodies.pdf+berber+sardinha+semantic+prosodies&hl=en
Channell, J. 2000. “Corpus-based analysis of evaluative lexis.” In Evaluation in Text: Authorial
Stance and the Construction of Discourse, S. Hunston and G. Thompson (eds), 38–55. Ox-
ford: Oxford University Press.
Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
Hunston, S. 2007. “Semantic prosody revisited”. International Journal of Corpus Linguistics 12
(2): 249–268.
Louw, B. 1993. “Irony in the text or insincerity in the writer? The diagnostic potential of seman-
tic prosodies.” In Text and Technology: In Honour of John Sinclair, M. Baker, G. Francis and
E. Tognini-Bonelli (eds), 157–175. Amsterdam/Philadelphia: John Benjamins.
Munday, J. (forthcoming). “Looming large: A cross-linguistic analysis of semantic prosodies in
comparable reference corpora.”
Partington, A. 1998. Patterns and Meanings: Using Corpora for English Language Research and
Teaching. Amsterdam/Philadelphia: John Benjamins.
Partington, A. 2004. ‘“Utterly content in each other’s company’: Semantic prosody and seman-
tic preference.” International Journal of Corpus Linguistics 9 (4): 131–156.
Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Sinclair, J. 1996. “The Search for Units of Meaning.” Textus 9: 75–106.
46 Dominic Stewart
Stewart, D. (forthcoming). Semantic Prosody: A Critical Evaluation. London/New York: Rout-

ledge.
Stubbs, M. 1996. Text and Corpus Analysis: Computer-assisted Studies of Language and Culture.
Oxford: Blackwell.
Stubbs, M. 2001. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell.
Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benja-
mins.
Tognini-Bonelli, E. 2002. “Functionally complete units of meaning across English and Ital-
ian: Towards a corpus-driven approach.” In Lexis in Contrast. Corpus-based Approaches,
B. Altenberg and S. Granger (eds), 73–95. Amsterdam/Philadelphia: John Benjamins.
Tymoczko, M. 1998. “Computerized corpora and the future of Translation Studies”. In The Cor-
pus-Based Approach. Special Issue of Meta, S. Laviosa (ed.), 43(4), 652–660.
Whitsitt, S. 2005. “A critique of the concept of semantic prosody.” International Journal of Cor-
pus Linguistics 10(3): 283–305.
Xiao, Z. and McEnery, A. 2006. “Near synonymy, collocation and semantic prosody: A cross-
linguistic perspective.” Applied Linguistics 27(1): 103–129. (Also available at: www.lancs.
ac.uk/postgrad/xiaoz/papers/collocation.doc).
Are translations longer than source texts?
A corpus-based study of explicitation
Ana Frankenberg-Garcia
Instituto Superior de Línguas e Administração, Lisboa,
Fundação para a Computação Científica Nacional, Portugal
Explicitation is the process of rendering information which is only implicit in

the source text explicit in the target text, and is believed to be one of the uni-
versals of translation (Blum-Kulka 1986, Olohan and Baker 2000, Øverås 1998,
Séguinot 1988, Vanderauwera 1985). The present study uses corpus technology
to attempt to shed some light on the complex relationship between translation,
text length and explicitation. An awareness of what makes translations longer
(or shorter) and more explicit than source texts can help trainee translators
make more informed decisions during the translation process. This is felt to be
an important component of translator education.
Key words: Explicitation, translation universals, corpora, text length, translator

education
1. Introduction
What translators should and what they shouldn’t do with texts has been a mat-
ter of controversy since Cicero (and later St. Jerome) first made reference to the
word-for-word versus sense-for-sense dichotomy. In recent years, however, there
has been a change of emphasis in translation studies away from the debate of
what translators ought to do and towards descriptive studies of what practicing
professional translators generally do. The shift of focus is beneficial to translator
education. Instead of being swamped with prescriptive dos and don’ts, trainee
translators who are made aware of regular features of translated texts can use
this knowledge to make their own conscious and informed decisions during the
translation process.
The present study uses corpus technology to revisit one of the more widely
discussed characteristics of translated texts: the phenomenon of explicitation.
48 Ana Frankenberg-Garcia
Unlike previous studies, however, an attempt is made here to analyse explicita-

tion from the perspective of text length. The relationship between translation,
explicitation and text length is not simple, and in this study I try to shed some
light on the complexity of the matter. In particular, I wish to draw attention to the
difficulties of comparing text length across languages, to what happens to word
counts in bi-directional analyses of comparable source texts and translations, and
to how explicitation appears to be an intrinsic feature of translation even when
translations do not have more words than source texts. The analysis carried out in
the present study would not have been possible without recourse to corpora, and
it is hoped that the results obtained can inform translator education and transla-
tion practice.
2. Explicitation
Explicitation is the process of rendering information which is only implicit in the

source text explicit in the target text (Vinay & Darbelnet 1958). Explicitation is
obligatory when the grammar of the target language forces the translator to add
information which is not present in the source text, but can occur voluntarily
when, for no grammatically compelling reason, translators distance themselves
from the source text in a way that makes the target text easier to comprehend.
Example (1) below illustrates the obligatory explicitation of gender in the transla-
tion of English into Portuguese.
(1) EBJT2 2038
Source Frances liked her doctor.
Translation Frances gostava dessa médica.
Back translation Frances liked this female doctor.
As Portuguese is marked for gender, the translator in example (1) was forced to
discriminate between a female and a male doctor. Obligatory explicitation can
also occur in the reverse direction. Example 2 illustrates two different aspects of
obligatory explicitation in the translation of Portuguese into English. First, while
the Portuguese possessive pronoun sua agrees with the object pele, the equiva-
lent her in English agrees with the subject. This means that while the Portuguese
reader has no means of telling that the skin in the text belongs to a female, the
English translator was forced to make the connection explicit. Second, since
. All examples were taken from the COMPARA corpus, Available at www.linguateca.pt/
COMPARA/. Letter and number codes identify source/translation pair plus alignment unit in
question.
Are translations longer than source texts? 49
ortuguese is a pro-drop language, the reader will read on and still not know
P
whether the person whose nose is ‘the most voluminous one in the world’ is a
man or a woman. As English is not a pro-drop language, the translator had to
insert the pronoun she, making it once again clear to the reader that the person
in question is a female.
(2) PBMR1 575
Source […] sua pele lembrava a crosta lunar e tinha o nariz mais
volumoso do mundo […]
Literally […] his/her skin reminded one of the lunar crust and Ø had
the most voluminous nose in the world […]
Translation […] her skin resembled the lunar crust and she had the most
voluminous nose in the world […]
In contrast to obligatory explicitation, voluntary explicitation is not dictated by

the grammar of the target language. It can be a result of a conscious decision to
make the target text easier to understand or even of a subconscious operation in-
herent to the process of translation. In example (3), the translator introduced the
adverb so at the beginning of the English sentence, although it is neither present
in the Portuguese source text, nor there is anything about the grammar of English
that makes it compulsory. The effect is that the connection between the event
described by that sentence and a previous one in the text is made more explicit in
the translation.
(3) PBAD1 435
Source Você também gosta dela?
Literally You like her too?
Translation So you like her too?
As shown in example (4), exactly the same can occur in the translation of English
into Portuguese.
(4) EBDL3T2 799
Source “It’s probably Rummidge.
Translation – Então é provável que seja Rummidge.
Back Translation “So it’s probably Rummidge.
Voluntary explicitation is being used here as an all-embracing term that covers all
explicitation that is not obligatory, from the explicitation of syntactically optional
elements and markers of cohesion to the explicitation of cultural information. In
example (5), the translator made the interrogative form more explicit by adding
a question beginning that was not present in the source text, and used a footnote
to add information about the use of a quote from Shakespeare and even about
Shakespeare’s birthplace.
(5) EBDL3T2 332
Source «All’s Well That Ends Well? » he snaps back, quick as a
flash.
Translation – Será que é All’s well that ends well ?* – ele diz rápido como
um relâmpago.
*Translation Note: Tudo está bem quando acaba bem é o
título de uma peça de Shakespeare, que nasceu em
Stratford-upon-Avon.
Back Translation Could it be All’s well that ends well? – he says quick as a
flash.*.
*Translation Note: All’s well that ends well is the title of a
play by Shakespeare, who was born in Stratford-upon-
Avon.
Similarly, in example (6), the translator added a subject and a verb which had
been implicit in the source text, introduced the first name of the poet referred to
only by his last name in the source text, and inserted a footnote to explain who the
poet was and the title of his great epic poem.
(6) PBAA2 47
Source Em pequeno meteram-lhe na cabeça vários trechos do Camões
[...]
Literally When young they put in his head various passages of Camões
[...]
Translation When he was young, someone had crammed various passages
of Luís de Camões into his head[...]*
*Translation Note: Luís de Camões (1524–80) – Portugal’s
national poet; wrote Os Lusíadas (1572).
There is abundant evidence of voluntary explicitation in the literature on transla-

tion studies. Vanderauwera (1985), for instance, described numerous examples in
the English translation of Dutch novels. Blum-Kulka (1986) found cohesive devic-
es in Hebrew translations that were not present in English source texts. Séguinot
(1988) found non-obligatory connectives in translations from English into French
and from French into English. Based on studies such as these, voluntary explicita-
tion has come to be viewed as one of the universals of translation (Vanderauwera
1985) and as something inherent to the nature of the translation process (Sé-
guinot 1988). After a systematic study of the phenomenon from a perspective
of discourse, Blum-Kulka (1986) put forward the explicitation hypothesis, which
holds that translations tend to be more explicit than source texts, regardless of the
increase in explicitness dictated by language-specific differences.
In the beginning of the nineties, Baker (1993) predicted that qualitative stud-
ies such as the above could be greatly enhanced by quantitative, corpus-based
analyses of translations. Indeed, Øverås (1998) examined explicitation and im-
plicitation shifts in the English-Norwegian Parallel Corpus, and found that there
was more explicitation than implicitation in both Norwegian translated from
English and English translated from Norwegian. Using two comparable corpora,
Olohan and Baker (2000) analysed the insertion of the optional that following the
reporting verbs say and tell in data from the Translational English Corpus (TEC)
and the British National Corpus (BNC), and found that the explicitation of that
is more frequent in the English translations from the TEC than in the English
originals from the BNC.
The present study is an attempt to analyse voluntary explicitation from the
perspective of text length. Because voluntary explicitation is generally achieved
by the addition of extra words in the translated text, this study seeks to test
whether translations are likely to be longer than source texts, regardless of the
languages concerned. Using the COMPARA corpus (Frankenberg-Garcia and
Santos 2003), the length of original English and Portuguese language literary
text extracts was compared with the length of their respective translations into
Portuguese and English.
3. Text length in COMPARA 5.2
COMPARA is a free, online parallel, bi-directional and extensible corpus of Eng-

lish and Portuguese literary texts, currently in version 10.1.3, with 3 around mil-
lion words. In this study, an earlier version of the corpus was used. Version 5.2,
accessed in November 2003, contained 37 source texts (25 in Portuguese and 12
in English) and 40 translations (the corpus admits the alignment of more than
one translation per source text). The text extracts varied from just under 2000 to
over 42000 words. The work of twenty-seven different authors and thirty-one dif-
ferent translators was represented, with some authors and translators being repre-
sented more than once. The overall distribution of Portuguese and English words
in COMPARA at the time is summarized in Table 1.
The above figures indicate that while the English translations in the corpus
contained on average 11% more words than their source texts in Portuguese, the
. Available at http://www.linguateca.pt/COMPARA/.
Table 1. Distribution of Portuguese and English words in COMPARA 5.2

Words Source texts Translations
Portuguese 388452 384285
English 388430 431691
Portuguese translations contained 1% fewer words than their source texts in Eng-
lish. However, all these numbers can tell us is that translators working from Por-
tuguese into English will probably earn more if they base their fees on the number
of words in the translated text, while those working from English into Portuguese
might be better off if they get paid by the number of words in the source text. The
above distribution of words does not shed any light on the relationship between
translation and explicitation, for it is impossible to tell the extent to which the
differences observed are due to differences between Portuguese and English or
differences between source texts and translations.
3.1 Text length across languages
Claims about the relative length of texts across languages are extremely difficult
to put to test. In a recent discussion on the Corpora List, there were over twenty
postings on the subject. The main problem seems to be that, because of the di-
verging lexico-grammatical characteristics of languages, it is complicated to de-
cide on what scale to use. Different measures will affect different languages dif-
ferently. If text length is measured in terms of number of words, for example, it is
not hard to see that whatever the criteria for counting words are, they might make
some languages seem lengthier than others. Table 2 illustrates this by means of a
few examples of how word processors count equivalent meanings in Portuguese
and English.
As can be seen, English allows for contractions like isn’t, which are not pos-
sible in Portuguese: não é. A word processor counts the former as one word
and the latter as two words. Even if contractions were to be counted as separate
words, however, there are other problems. For example, there are many com-
pound words in English, like teapot, which have to be written separately in Por-
tuguese: bule de chá. But then not everything in English is more economical than
in Portuguese. Portuguese clitics are often attached to verbs, making separate
words in English, like gave him, count as a single one in Portuguese: deu-lhe.
Also, because Portuguese is a pro-drop language, it is often the case that only one
word is required to say things that would take three or four words in English. For
. Available at http://helmer.aksis.uib.no/corpora/.
Table 2. Word processor word counts in English and Portuguese

English Portuguese
isn’t (1) não é (2)
teapot (1) bule de chá (3)
gave him (2) deu-lhe (1)
Did you like it? (4) Gostou? (1)
example, to ask the four-word question Did you like it? in Portuguese, only one
word is required: Gostou?
This is not the place for an extensive contrastive analysis of the lexico-gram-
matical characteristics of the two languages. The examples seen, however, show
that word counts per se are not enough to compare text length across languages,
let alone analyse the relationship between translation and explicitation. In fact, as
example (7) indicates, a translation can be more explicit than a source text even
when it has fewer words.
(7) EBDL1T1 670
Source What have I got to complain about? (7 words)
Translation De que me queixo então? (5 words)
Back Translation What have I got to complain about then?
Conversely, example (8) illustrates how there can be an increase in words in trans-
lation without any explicitation whatsoever:
(8) PBRF1 1299
Source Fui visitá-lo. (2 words)
Literally I went to visit him.
Translation I went to visit him. (5 words)
Some postings on the Corpora List argue that character counts constitute a better
measure for comparing text length across languages inasmuch as they disregard
the morphological and syntactic problems of word counts. However, as shown in
Table 3, equivalent meanings in two languages can also vary in terms of character
length. Differences in the number of characters in source texts and translations
cannot therefore help to analyse the question of explicitation any more than word
counts can.
Another method for comparing text length across languages suggested in the
discussion list is morpheme counts. Indeed, as can be seen in Table 4, counting
the number of morphemes of equivalent meanings in two different languages
does seem to flatten out many of the problems of word and character counts.
Table 3. Character counts (with spaces) in English and Portuguese

English Portuguese
Table 4. Morpheme counts in English and Portuguese

English Portuguese
However, morphemes are not only extremely difficult to count, but they are
also sensitive to obligatory lexico-grammatical differences between languages.
Thus in the examples given, teapot is made up of two morphemes, but its Portu-
guese equivalent, bule de chá, is made up of three because the preposition de has to
be inserted to link the nouns bule and chá. Likewise, the English sentence Did you
like it? has one morpheme more than its Portuguese equivalent Gostou? because
the English verb like has to be followed by an object, while its Portuguese equiva-
lent, gostar, doesn’t. As morpheme counts do no discriminate between the addition
of morphemes dictated by language specific differences and the extra morphemes
that are a product of voluntary explicitation, they too are not appropriate for ana-
lysing explicitation independently of the differences between languages.
Notwithstanding these limitations, the present study works on the assump-
tion that language-dependent biases can be controlled in bi-directional analyses.
In other words, when comparing source texts and translations to find out whether
text length increases in translation, it is assumed that an analysis of the transla-
tions from language y into language z combined with an analysis of the transla-
tions from language z into language y may shed some light on the extent to which
differences in text length are due to language-dependent factors alone. In other
words, if counting words, characters or morphemes can make texts in one lan-
guage seem comparatively shorter or longer, we believe this will affect both the
translations and the source texts of the language in question. A carefully balanced,
bi-directional sample of source texts and translations will therefore enable one to
filter out language-dependent biases, and find out whether translations are longer
than source texts regardless of the changes in text length dictated by language-
specific constraints.
Table 5. Source texts and translations selected for text length analysis
Text ID Author Translator
EBDL2 David Lodge M. Carlota Pracana
EBJB1 Julian Barnes Ana M. Amador
EBJT1 Joanna Trollope Ana F. Bastos
ESNG1 Nadine Gordimer Geraldo G. Ferraz
EUHJ1 Henry James M.F. Gonçalves
EBLC1 Lewis Carrol Y. Arriaga, N.Videira & L.Lobo
EBOW1 Oscar Wilde Januário Leite
EURZ1 Richard Zimler José Lima
PBPC1 Paulo Coelho Alan Clarke
PBMR1 Marcos Rey Cliff Landers
PMMC1 Mia Couto David Brookshaw
PPMC1 Mário de Carvalho Gregory Rabassa
PPSC1 Sá Carneiro Margaret J. Costa
PBAD1 Autran Dourado John Parker
PBMA3 Machado de Assis John Gledson
PPCC1 C. Castelo Branco Alice Clemente
3.2 A balanced corpus
Although COMPARA 5.2 contains a similar amount of Portuguese and English

words (c.f. Table 1), it is not a balanced corpus. According to Frankenberg-Garcia
and Santos (2003: 74), the responsibility of achieving balance, if balance is neces-
sary for a particular study, “is left entirely in the hands of the user” of the corpus.
In the present study, as discussed in the previous section, balance was deemed es-
sential. It was important to take care that neither Portuguese nor English, nor any
particular author or translator, was over-represented. To ensure this, the starting
point for the analysis was the selection of a sub-corpus of sixteen source texts by
eight different native-English authors and another eight different native-Portu-
guese authors translated into Portuguese and English by sixteen different transla-
tors. The texts used in the analysis are identified in Table 5.
Another crucial aspect of balance was the size of each source text. In order
to assign equal weight to the English-Portuguese and Portuguese-English trans-
lations, it was important to take as a starting point for the analysis source-text
extracts of the same length in the two languages. COMPARA’s Advanced Search
facility was used to retrieve a random selection of sentences from each of the
source texts in Table 5 aligned with their corresponding translations. Because of
copyright restrictions, some of the samples obtained were much shorter than oth-
ers. To correct this imbalance, all source texts were reduced to around 1500 words
each, which was the approximate size of the smallest source-text sample obtained.
This was achieved simply by cutting down on the number of concordances re-
trieved for each source text until what was left added up to or near 1500 words.
The next step was to count how many words there were on the translation side of
the parallel concordances. To be extra rigorous in the analysis, translators’ notes
were excluded from the study such that only the main translation texts were taken
into consideration in the word counts.
4. Results
The number of words in the 16 English and Portuguese source texts analysed
and the number of words in their corresponding translations into Portuguese and
English are summarized in Table 6.
According to the above figures, while five Portuguese translations had fewer
words than their corresponding source texts in English, the remaining eleven
translations (3 English>Portuguese and 8 Portuguese>English translations) were
all longer than their corresponding source texts. The figures also show that the
Table 6. Distribution of words in source texts and translations of a balanced,

bi-directional sample of comparable Portuguese and English text extracts
ST Language Text ID ST words TT words
English EBDL2 1501 1585
EBJB1 1499 1467
EBJT1 1501 1538
ESNG1 1498 1441
EUHJ1 1499 1364
EBLC1 1499 1321
EBOW1 1498 1299
EURZ1 1500 1550
Portuguese PBPC1 1499 1682
PBMR1 1499 1714
PMMC1 1502 1867
PPMC1 1501 1726
PPSC1 1502 1714
PBAD1 1501 1675
PBMA3 1500 1753
PPCC1 1502 1583
Total 24001 25279
Mean 1500 1580
increase in the number of words appears to be more pronounced in the transla-

tion of Portuguese into English than in the translation of English into Portuguese.
However, as pointed out earlier, these word counts do not mean much in them-
selves because one language could be stretching the word counts more than the
other. To filter out language-dependent biases, we need to consider these figures
as a whole. A Paired Student’s t-test was therefore applied to the above figures in
order to test whether this overall increase in words from source text to translation
was significant. The t-value obtained for a one-tailed test at the 95% significance
level enabled one to reject the null hypothesis. In other words, it can be said with
95% confidence that the translations in this sample contained on average signifi-
cantly more words than the source texts.
5. Conclusions
Assuming that the balanced, bi-directional sample of comparable Portuguese and

English source texts and translations used in the present study constituted an ef-
fective means of cancelling out the language-dependent biases of word counts, it
is possible to conclude that the overall increase in the number of words observed
in the translations is more likely to be due to differences between source texts and
translations than due to lexico-grammatical differences between Portuguese and
English. Given that voluntary explicitation often takes the form of the addition
of extra words in the translated text, the present results provide quantitative evi-
dence in support of the idea that translations tend to be more explicit than source
texts, regardless of the changes dictated by language-specific differences.
Since the present analysis was based on only a small sample of Portuguese and
English source texts and translations, in the future it would be necessary to carry
out additional comparisons of source texts and translations using more texts. As
only literary texts were used, it would also be important to find out if different
genres render similar results. Another essential research question for the future
would be to find out if the present results can be replicated using different lan-
guage pairs.
6. Implications for translator education
It is not uncommon to overhear in educated circles claims that some languages are
“wordier” than others, and that this is the reason why translations are longer or –
depending on the language direction – shorter than source texts. Trained transla-
tors should know better. An important goal of translator education is achieved
when trainee translators become aware of the complexity of translation. This in-
cludes becoming aware of the reasons why text length can vary from source texts
to translations. As I hoped to have shown in this paper, the relationship between
translation and text length is not dictated just by the morphological and syntac-
tic differences between languages, and obligatory explicitation is something quite
different from voluntary explicitation. Translators who become aware of issues
such as these can make more conscious and more informed decisions during the
translation process.
Acknowledgements
Part of this work was done in the scope of the Linguateca project, jointly funded
by the Portuguese Government and the European Union (FEDER and FSE) under
contract ref. POSC/339/1.3/C/NAC
References
Baker, M. 1993. “Corpus linguistics and translation studies. Implications and applications.” In
Text and Technology: In Honour of John Sinclair, M. Baker, G. Francis and E. Tognini-
Bonelli (eds), 233–250. Amsterdam/Philadelphia: John Benjamins.
Blum-Kulka, S. 1986. “Shifts of cohesion and coherence in translation”. In Interlingual and In-
tercultural Communication: Discourse and Cognition in Translation and Second Language
Acquisition Studies, J. House and S. Blum-Kulka (eds), 17–35. Tübingen: Gunter Narr.
Frankenberg-Garcia, A. and Santos, D. 2003 “Introducing COMPARA, the Portuguese-Eng-
lish Parallel Corpus”. In Corpora in Translator Education, F. Zanettin, S. Bernardini and
D. Stewart (eds), 71–87. Manchester: St. Jerome.
Olohan, M. and Baker, M. 2000. “Reporting that in translated English: Evidence for subcon-
scious processes of explicitation?” Across Languages and Cultures 1(2): 141–158.
Øverås, L. 1998. “In Search of the Third Code: An investigation of norms in literary translation”.
Meta, XLIII, 4: 571–588.
Séguinot, C. 1988. “Pragmatics and the Explicitation Hypothesis”. TTR: Traduction, Terminolo-
gie, Rédaction 1 (2): 106–114.
Vanderauwera, R. 1985. Dutch Novels Translated into English: The transformation of a ‘minority’
literature. Amsterdam: Rodopi.
Vinay, J. P. and Darbelnet, J. 1958. Stylistique comparée du français et de l’anglais: Méthode de
traduction. Paris: Didier.
Arriving at equivalence
Making a case for comparable general reference
corpora in translation studies
Gill Philip
University of Bologna, Italy
When multilingual corpora are used in translation studies, it is usually assumed

that they are either translated (parallel) or comparable, or both; and that their
size and text composition are analogous. As general reference corpora become
more widely available, it is inevitable that these too should be used to compare
and contrast SL norms, thus extending the definition of comparability to in-
clude text collections whose size and content may vary considerably, and which
are nevertheless considered representative of their languages. This paper ad-
dresses the contribution of comparable reference corpora to the identification
of translation equivalence. Focusing in particular on native-speaker norms, it
demonstrates how the effect of creative and idiosyncratic language can be iden-
tified and reproduced by the translator.
Key words: General reference corpora, translation equivalence, synonymy,

creativity
1. Introduction
In translation studies, multiple corpora are used to study like with like across lan-
guages. This may be achieved with translation corpora, in which the texts are the
source language (SL) text in one corpus and translations of that text in the other(s);
or, as is now more frequently the case, by using comparable corpora, composed
of SL texts of a similar scope and content in each of the language(s) concerned.
The adoption of comparable corpora has made it possible to move away from the
study of translation as a product, and to focus instead on the identification, and
reproduction in translated texts, of norms proper to the Target Language (TL)
concerned. In other words, rather than studying previous translation choices (in
a translation corpus), comparable corpora reveal how the word, phrase or term is
60 Gill Philip
actually rendered by native-speakers of the TL, allowing the translator to produce

text which passes as native-like.
While small specialised corpora resolve issues pertinent to specialised lan-
guages or particular domains, it is beyond their scope to provide insights of a
more general nature regarding the language as a whole. With the exception of
technical language, which positively eschews turns of phrase, natural language
abounds with idiomatic, metaphorical and other phraseological expressions,
which create a range of difficulties for the translator. In particular, the peculiarity
of phraseology in a SL text has to be accurately assessed: what meaning is being
conveyed, and to what extend do the individual words convey that meaning? Is
the expression conventional, is it novel, or is it a creative exploitation of a familiar
expression? Is it marked or unmarked?
These problems, which contribute to the production of both oddities and
normalisation in the TL, are matters which cannot be adequately addressed by
small comparable corpora. However, translation corpora fare little better, as they
offer previously-made translation choices, rather than provide the translator with
a range of TL norms to choose from. Translation corpora and comparable corpo-
ra are restricted in size and scope, encapsulating single or closely-related genres,
and are designed to address the specific needs of genre- or domain-specific trans-
lation, and they are neither large nor wide-ranging enough to be able to give an
indication of more generalised norms within the languages under study. These
norms contribute to the perception of naturalness in text, but are not as easily
identified as terminological or structural aspects are.
It is here that the role of general reference corpora proves its worth. While ter-
minology and other text- or genre-specific language are dealt with more profitably
using relatively homogeneous comparable corpora, matters of a more wide-rang-
ing and general nature can be usefully addressed by identifying and contrasting
the language norms displayed in general reference corpora for the languages under
study. This paper describes how comparable general reference corpora can be used
to identify translation equivalents through the analysis and matching of cotextual
patterns, and reveals how knowledge of these norms can be profitably exploited to
avoid normalisation in the translation of creative and idiosyncratic language.
2. Comparability and general reference corpora
In extending the definition of comparable corpus to include text collections whose

size and content may vary considerably, a number of matters must be addressed
Arriving at equivalence 61
regarding the composition and size of the corpora, as well as their representative-
ness relative to their respective languages.
In an ideal world, all general reference corpora would follow the same design
criteria, making them of similar size, composed of similar text types in similar
proportions. However in real life several standard models coexist, and each has
its proponents and detractors. The Brown corpus (1 million words) and those
modelled on it, including Frown, LOB and FLOB, comprises text samples of equal
length, but as a corollary of this there are very few whole texts present in the data
set, which means that organisational features may not be adequately represented.
The British National Corpus (BNC) contains 100 million words of both spoken
and written texts (10% and 90% respectively) produced since 1964; sampling only
takes place in texts which exceed 45,000 words in length, and is intended to avoid
the risk of a single author’s idiosyncrasies skewing the data. In common with
the Brown corpus, the BNC is static, and can only be kept up-to-date through
re-issue. The corpus used in this study, the Bank of English, is a monitor corpus
which undergoes constant updating and expansion, and now comprises 450 mil-
lion words of running text.
Publicly-accessible corpora for Italian are few and far between. This study
draws on the Corpus di Italian Scritto (CORIS), which was the only Italian lan-
guage corpus available at the time when this research was being carried out. The
composition of the 80 million-word CORIS is modelled on the written compo-
nent of the Longman corpus of Spoken and Written English, making it qualita-
tively different from, as well as considerably smaller than, the Bank of English.
Table 1 gives an indication of the distribution of text types in the two corpora.
Given these differences, it might appear far-fetched to describe the Bank of
English and CORIS as comparable, but the most important consideration to bear
in mind is that they are large general reference corpora, not small, text-or genre-
specific comparable corpora. Dissimilar composition is not proof of incompara-
bility: in accepting that languages are anisomorphic, it should come as no surprise
. Although representativeness remains a moot point in corpus linguistics, it is beyond the

scope of this paper to enter into the details of the argument: the reader is referred to Biber
(1993) for a comprehensive account, and to Laviosa (1997) for considerations specific to com-
parable corpora.
. See http://www.natcorp.ox.ac.uk/corpus/creating.xml for details of the text composition of
the BNC.
. The author expresses her gratitude to the University of Birmingham/HarperCollins pub-
lishers for access to the full Bank of English for the duration of her PhD research, 1997–2003.
. Access to CORIS is available by request http://corpora.dslo.unibo.it.
62 Gill Philip
Table 1. Proportions of text types in the Bank of English and CORIS
Text type Bank of English CORIS
misc. journalism 65.5% 47.5%
general prose 17% 25%
academic prose 1.5% 12.5%
legal prose – 10%
ephemera 1% 5%
spoken 15% –
that text types have different degrees of prominence and frequency of occurrence
in different cultural and linguistic contexts, and that decisions regarding the rep-
resentativeness of a general reference corpus for any language (or local language
variety) must take this fact into account. In fact, the decision to model CORIS
on LSWE was taken because its make-up was deemed more appropriate to Ital-
ian than the other contending models, most notably the BNC, LOB and Bank
of English (Rossini Favretti 2000: 51). General reference corpora are expected to
exemplify their languages in a balanced way, and as it is true that languages are
not translations of each other, so representative samples of those languages need
not mirror one another’s composition. Viewed in this light, then, the comparabil-
ity of the Bank of English and CORIS is not absolute, as two corpora constructed
to the same design specification, but rather relative, with each corpus being in-
dependently constructed to take account of language-specific features and thus
constitute a representative sample of the languages concerned.
3. Background to this research
Having justified the use of the term comparable, it is now necessary to fill in some
of the background to the applicability of comparable general reference corpora to
the translation examples to be examined in 4.1.1 and 5.1.
The research from which this paper is drawn, Philip (2003), is a corpus-driven
study of connotation in non-literary language. It examines the meaning of colour
words as found in conventional linguistic expressions such as to see red, to feel
blue, and green with envy, and explains what factors are responsible for activating
. Since the time this research was undertaken, the substantially larger Italian corpus com-
posed of texts from the Repubblica newspaper http://dev.sslmit.unibo.it/index.php has become
available to researchers (see Aston & Piccioni 2004). This corpus, taken in combination with
CORIS, would constitute a general reference corpus for Italian comparable with the written
component of the Bank of English in both size and content.
the connotative meanings of the colour words when the expressions are used in
running text. By comparing colour-word expressions with a number of near-syn-
onyms which display similar phraseological patternings (e.g. to catch red handed,
to catch in the act, to catch in flagrante delicto), it can be observed that the selec-
tion of one expression over another is largely predetermined by the situational
context which the language is describing, and is both predicted and constrained
by the regularity of patterning in the co-textual environment.
Although colour words are widely considered to be highly salient, Philip
(ibid.) demonstrates that when they occur as part of conventional, non-composi-
tional expressions, their meaning is subjected to the process of delexicalisation in
the same way as any other component of a non-compositional chunk, in accor-
dance with Sinclair’s (1991) idiom principle and Louw’s (2000) theory of progres-
sive delexicalisation. As a result, the metaphorical-connotative meaning potential
of these words remains latent, only rearing its head when the expressions undergo
creative variation.
When the canonical forms of conventional expressions are altered, the way in
which the phrase is interpreted changes radically, because the novel element has
to be integrated into the whole. In order to do this, the non-compositional phrase
is broken down into its component parts, and regains a degree of compositional-
ity. The meaning is then reprocessed to make the relationship of the novel element
to the underlying canonical form contingent; in doing so, meanings which are
normally delexicalised regain a degree of saliency and metaphorical life. This can
be observed by comparing the canonical forms in (1) and (2) with the creative
variants in (3) and (4).
(1) The gang was finally caught red-handed in an armed police ambush in
September 1992.
(2) At the time it was claimed Kerr had been caught red-handed trying to smuggle
arms to the Irish Republic.
(3) A car hi-fi thief was caught Simply Red-handed when he took a CD player into
a store owned by his victim.
(4) Mr. Green apparently had been caught scarlet-handed at his own blackmail
game. Pictures of him with Miss Scarlet were found hidden in Scarlet’s bed-
room.
A similar phenomenon takes place when the cotext includes an element that
favours a salient interpretation within the non-compositional phrase, as is the
case with (4) which includes a colour-word in the proper name, in addition to
the colour word component of blackmail. The proximity of colour words in the
64 Gill Philip
hraseological core and in the co-text causes the delexical colour word to be rel-
p
exicalised, thus re-activating the salient meaning.
In both these types of variation – phrase-internal, and phrase-external – the
chunk is read as a phraseological palimpsest, the sum of the underlying conven-
tional, delexicalised meaning and the novel, salient one that is superimposed on
top of it.
4. Identifying translation equivalents
Conducting such a study with reference to two languages has translation as its ul-
timate aim. Monolingual reference corpora make it possible to identify the mech-
anisms which drive creativity in both languages concerned, and the results ob-
tained demonstrate that the changes in meaning are governed by the same general
principles – the combination of delexical meaning and a contextually-relevant,
salient add-on. This knowledge provides the basis for the informed translation
of unconventional language, especially that found in literature, journalism and
advertising, where word-play and anomalous language often falls victim to the
normalisation process in translation (Kenny 2001: 65–69).
Translation involves a great deal of choice, whether explicit or implicit. Choice
implies the selection of one interpretation, expressed by a particular sequence of
words, over other possible contenders, with the aim to achieve as close an effect as
possible to that obtained in the SL text. As Halliday puts it:
The translator is aware that a given item in the source has a set of possible equiva-
lents in the target language. [S/he is] aware that these are not free variants but
they are contextually conditioned. By ‘contextually conditioned’ I do not mean
that in a given context you must choose A and cannot choose B or C, but that if
you choose A or B or C then the meaning of that choice will differ according to
what the context is. (Halliday 1992: 16)
So what is the translator’s choice based on? Expert knowledge of the languages
provides a substantial degree of intuition regarding equivalence, and language
reference books and media fill in the gaps to a certain extent; but when the trans-
lator is faced with a range of apparently synonymous possibilities, how should he
or she proceed? No two expressions are identical in meaning and function, but
the fine details of the distinctions all too often escape our conscious knowledge.
In this case, the translation network comes into play. This need not involve
the use of corpora, although both translation and comparable corpora clearly add
detail which dictionaries and glossaries are not in a position to do. Reference to
corpus data makes it possible to identify where differences and similarities lie
across languages, thus fine-tuning the translator’s knowledge. But while interest-
ing as an academic exercise, the identification of exhaustive sets of equivalences
involves umpteen passages of translation and back-translation. The example pre-
sented in Váradi and Kiss (2001), based on a translation corpus, demonstrates
how cumbersome the procedure can be with new terms being added with each
passage from one language to the other. If the same procedure were carried out
using comparable corpora (domain-specific or otherwise representing a restrict-
ed range of the languages concerned), the resulting translation web would be less
messy and its realisation less onerous, if only because the language represented in
the corpora is more homogeneous than that found in general reference corpora.
Whichever source of data is used, the building up of translation networks
generally starts with a SL word form and a hypothetical translation in the TL (see
Tognini Bonelli 2002: 81–82 for details of this procedure using two monolingual
reference corpora, with the option of using a translation corpus, where available,
to enrich the process). As the SL word form’s patternings crystallise into function-
ally-defined “units of translation” (ibid. 80), each of these units must be matched
up with a unit in the TL. However, the unit in the TL may be a leaky correspon-
dence, either being too specific to cover all uses of the SL unit, or, conversely,
having a wider range of application than its SL equivalent. In the former case,
further units in the TL have to be identified; in the latter, the distinct senses in the
TL unit have to be matched to new units in the SL. The network, or “translation
web” (Tognini Bonelli 2001: 150–154), becomes more complex with every process
of translation and back-translation as more and more terms are added and con-
nected up with their equivalents in a potentially never-ending cycle.
4.1 Translation equivalence and the paradigmatic axis
One way to place a control on the network is to work on the basis that the SL
word is one of several members of a larger semantic set, and as such it is dis-
tinguished and distinguishable from a range of near-synonyms. In this way, the
analysis of the patternings of the SL term and its near-synonyms is carried out
before embarking on the translation process. If the same procedure is applied
to the posited TL equivalent term, i.e. that it is considered as a member of an
analogous paradigm, for which the various patternings have to be identified, then
the location of translation equivalents becomes a matter of matching up pattern-
ings, rather than searching for new expressions every time a new pattern appears.
. In particular, the reader is referred to the schematic representation of sorrow and its transla-
tion into German (Váradi and Kiss 2001: 169).
66 Gill Philip
The correspondences are more detailed and accurate, and the tangled web of
translations can be replaced by a more robust and linear schema of one-to-one
correspondences that are arrived at independently of which language is to be con-
sidered the source or the target.
4.1.1 Case study: Go red

The approach outlined above is best illustrated with a practical example. By tak-
ing the expression to go red as a point of departure, it is necessary to decide which
other expressions belong to the paradigm in English, to posit an equivalent term
in the TL (here, Italian), and to identify its near-synonyms. This stage does not
require use of a corpus, as the necessary information can be found in standard
language reference works (mono-and bilingual dictionaries and thesauri). Thus,
from the single item to go red, the English paradigm can be identified as to go red,
to become red, to blush, to flush, to redden, and to turn red. The Italian equivalent
selected is diventare rosso; the other members of its paradigm are arrossare, arros-
sarsi, arrossire, arrossirsi, farsi rosso (in viso/faccia), and far salire il sangue.
The expressions are initially analysed without any reference being made to
their translatability. The English terms are studied via corpus data from an Eng-
lish general reference corpus (in this case, the Bank of English), and the Italian
terms are examined though Italian general reference corpus data (CORIS). Each
term is broken down into its sense divisions (for example, separating out reflexive
and non-reflexive forms of arrossar(si) and arrossir(si), and dividing the transitive
and intransitive forms of flush) and extended units of meaning (Sinclair 1996).
The expressions are profiled in terms of their collocational patterns, their colli-
gational and semantic preferences, any extra-linguistic function or context of use
that is indicated in the data, and, once the more detailed sub-senses and phrase-
ologies have been identified, any apparent semantic prosody that these suggest
is also noted. Only once this detailed monolingual examination is complete is
it possible to match up terms on the basis of the linguistic (and extra-linguistic)
features that they have in common. This makes it possible to identify transla-
tion equivalence in a much more detailed and consistent way than any approach
which takes the word alone as its starting point. By subdividing all the terms into
their smaller units, it is possible to recognise, for example, that the presence of
. This represents the full paradigm of translations found in Ragazzini 1995. It should be
noted that this is not an exhaustive list of every possible comparable expression, and it excludes
paraphrasis.
. The analysis of this data set made it evident that semantic prosodies cannot be identified for
each node, but are specific to larger units of meaning which include the node and the particular
collocational patternings which form around it: for this reason they are identified last of all.
r eflexivity can be a determining feature in arriving at translation equivalence

(arrossare corresponds to redden, yet its corresponding reflexive form arrossarsi
shares the same patterning as become red); or that the terms give rise to similar
(and equivalent) phraseological or terminological constructions, as is the case
with have the grace to blush and degnare di arrossire; and that the same subdivi-
sions of meaning may be expressed in similar ways across both languages, for
example go red as a beetroot and diventare rosso come un peperone which refer to
embarrassment, while their related forms go red as a lobster and diventare rosso
come un gambero describe sunburn.
4.2 Manual and automatic profiling
With its requirement for detailed analysis of members of a semantic set rather
than of a single term, the paradigmatic model may give the impression of be-
ing perhaps unnecessarily time-consuming, but it should be remembered that it
is proposed as an alternative to the existing – and considerably more onerous –
method involving successive stages of translation and back-translation. If the in-
tention is to compile some sort of translation database or to improve translators’
reference works, then the corpus approach gives the most comprehensive account
of how cotextual features contribute to the building up of meaning. It can provide
extremely detailed information about how the words in question combine, the
units of meaning that they generate, their textual positioning and their extra-lin-
guistic function; all these aspects are potentially necessary to the translator.
Word profiling of the sort discussed here can be done manually, automati-
cally or through a combination of both. The precise approach taken depends on
time available, the potential of the analysis tools, and indeed the corpus itself, as
some can only be interrogated through their built-in query software, which may
limit the degree to which the analysis can be automated. The profiling discussed
in this paper was carried out mainly by hand, the choice being determined by the
corpora used: both the Bank of English and CORIS are only available by remote
access, and can only be interrogated by their built-in query software.
Manual profiling is time-consuming, but generally highly accurate, as the
human analyst is able to recognise semantic relations between collocates more
easily than a computer can. Automatic profiling software is very sophisticated
and detailed, but is limited in the extent to which it can cope with semantic re-
lations. To date, no applications can go beyond taxonomic semantic relations,
. One such application is Sketch Engine (Kilgarriff and Tugwell 2002, Kilgarriff et al. 2004)
which runs on a variety of corpora in different languages http://www.sketchengine.co.uk/.
68 Gill Philip
i.e. hierarchical relations, lexical and semantic sets, to address the kind of ad hoc
relations which humans create and interpret freely, which are based on shared at-
tributes (see Glucksberg and Keysar 1993). When a regular pattern is found at the
abstract level of semantic preference rather than in the concrete realm of count-
able, recurrent word-forms, the ability to appreciate such relations is especially
important. Humans also find it easier to spot long-distance collocates (Seipmann,
2005), where the unit of meaning extends considerably farther than the extent of
the concordance line on the computer screen; they are also able to make sense
of incomplete text fragments or fractured phraseological patterns (Moon 1998)
such as humorous exploitations of idiomatic expressions where the original is
truncated or modified.
Manual profiling can of course be aided by corpus tools which guide the ana-
lyst towards particular patterns and phenomena. The “picture” option in the Bank
of English’s suite of tools (see Krishnamurthy 2000: 36–39) gives an overview of
collocational frequency between n–3 and n+3 around the search term, and was
used extensively in the analysis of the English data in this study; most PC concor-
dance packages now include very sophisticated tools for calculating collocations,
patterns, n-grams and so on by frequency. As noted above, frequency counts and
string-searches have their limitations, but they make initial profiling quick and
reliable, with manual intervention confined to verification, fine-tuning and trou-
ble-shooting.
5. Native norms and creativity in translation
In all attempts at pattern matching there will inevitably be some forms that ap-
pear not to have an equivalent, at least insofar as the paradigms studied are con-
cerned. This is the case with the sense of turn red that collocates with leaves and
berries. Although turn red can nearly always be translated as diventare rosso, this
form in Italian never occurs with plant collocates to give the meaning “ripening”,
nor do any of its near-synonyms. In such a case as this, a new, related paradigm
can be opened up for exploration (ripen and maturare, with their synonyms). On
the other hand, should no translation be found to be appropriate, then, as Baker
reminds us, “[a] certain amount of loss, addition, or skewing of meaning is often
unavoidable” (1992: 57). The recurring phrase arrossire fino ai radici dei capelli
(literally, “to blush to the roots of one’s hair”) is one such case in point. The trans-
lator should try to find an equivalent English expression (taking the verb as the
base from), or use a paraphrase; in either case, the choice must combine the sense
of blushing (including the semantic prosody), and the emphasis of extent: blush
deeply, go bright red, turn beetroot. A quasi-literal rendition would be marked
in English, and unless a particular effect was being sought, this would not be an
appropriate translation solution for a form which is unmarked in Italian. An un-
translated borrowing would only be appropriate if attention were deliberately be-
ing drawn to the Italianness of the original. A literal translation with gloss might
be appropriate in a commentary, but is unlikely to be so in narrative.
The adoption of a paradigm in translation adds a further degree of conscious-
ness to the translation process. The translator is able to enter into an awareness
of the language choices made by the author, and thus not only find the most ac-
curate translation, but also note the differences between this term and the oth-
ers which could have been used, but were not. This notion takes on particular
importance when the language being translated differs from the norm – either in
extreme cases such as the translation of poetry, or in the day-to-day inventiveness
that characterises normal language use. Peculiarities and deviations from the SL
norm can be assessed in relation to that norm and replicated in the TL, in full
consciousness rather than by mere instinct. This means that the translation can
match the effect of the original, because the mechanisms governing the effect can
be identified and reproduced.
Using general reference corpora as an aid to the translation process means
using data which makes it possible to assess and compare norms across the lan-
guages involved. Translated text does not fail utterly in this role, but it bears the
sign of translation choices already made – for good or ill. With normalisation
prevalent in the translation of (apparently) atypical language, it is useful to be
able to compare, as Kenny does (2001: 125ff.), translation corpus data with com-
parable corpus data, and to do so both for the SL and the TL.
5.1 Expressing emotion through colour
Colour words are typically used in European languages to express emotional states,
mainly because there is a fairly transparent metonymical connection between, for
instance, adrenaline speeding up the flow of blood through the body, and the face
becoming flushed or red. So it is justifiable to expect that colour-word expres-
sions should be used to refer to the manifestation of emotion in several languages.
What may come as something of a surprise, however, is that the colour words
typically used are not necessarily the same. For example, within Europe, English
is odd in that it associates the colour green with envy, when other languages prefer
yellow, the colour of bile; and blue meaning depressed (yet grey depressing) is far
from universal. But the non-equivalences do not end here.
Casting aside any cultural reasons why colours and emotional states should
not correspond exactly (see Niemeier 1998; Philip 2003: 151–164), the fact
70 Gill Philip
Table 2. Colours of rage and anger in Italian and English.

Italian English
nero di rabbia (8) black (0)
rosso di rabbia (5) red with anger (18)/ rage (28)
verde di rabbia (5) green with anger (1)/ rage (2)
bianco di rabbia (2) white with anger (7)/ rage (9)
blu dalla rabbia (1) blue (0)
viola (0) purple with anger (1)/ rage (21)
rosa (0) pink with anger (3)/ rage (2)
r emains that there is a degree of language variation in this area, and that a trans-
lator should be in a position to address it appropriately. Consider the following
corpus extracts (5)–(7), in which rabbia (rage) is assigned different colours – nero
(black), viola (purple), and verde (green).
(5) Chi sta vicino al Castel de’ Britti, dice che è nero di rabbia, che sogna la rivin-
cita.
(6) È viola di rabbia, una furia scatenata
(7) Quando mia nonna le ha risposto: “Speriamo di no, altrimenti verrà fuori una
puttana come te!”, ho visto mia madre diventare verde di rabbia.
How normal is it to use these colours to describe anger? An English-speaker

wishing to translate these examples might (erroneously) consider all three to be
innovative variations of the canonical form rosso di rabbia (red with rage). This
preconception derives from the translator’s L1 in which red with anger is the ca-
nonical expression; and while viola may not seem unusual because purple with
rage is quite common (see Table 2 for frequencies), nero would appear odd as
black is very uncommon, and in fact does not appear at all with either anger or
rage in the Bank of English data.
If these preconceptions are checked against Italian norms, as presented in
the general reference corpus, a different picture emerges. The fact of the matter
is that nero (8 of the 22 occurrences) is the most commonly used colour word in
this context, followed by rosso and verde (5 each), and bianco (white). It therefore
becomes apparent that the only unusual colour of the three that appear in the ex-
amples is viola, with both nero and verde being at least as frequent as the expected
rosso.10
10. The single example of viola di rabbia was located on the Internet; there were no occurrences
in the CORIS data.
What are the implications of this for translation? If the corpus data shows that
nero is commonly used in this pattern but black is not, how should the transla-
tor proceed? At this stage the principles of delexicalisation in conventionalised
phraseology come back to centre stage. Nero di rabbia is unmarked, and the term
which is correspondingly unmarked in English is red with anger/rage. By match-
ing these expressions, the salient meaning of the colour word has to be ignored
in favour of the unmarked phraseological meaning, which in this case is equiva-
lent. The same is true for verde di rabbia, again an unmarked form. Should there
be text-internal reasons for considering the colour to be relevant, the translator
could use the alternative, livid; but if there are no special circumstances to take
into consideration, then again red would serve to translate verde. The anomalous
viola di rabbia would be inaccurately rendered by an unmarked form such as
purple with rage, so some alternative rendering would be desirable; the most likely
course of action would be to move away from the basic colour terms (Berlin &
Kay 1969) and select a particular shade such as plum, puce or even regal purple.
In doing so, the colour is perceived accurately, but the effect of the SL original is
preserved because the phrase is not normalised.
Creative use of language may well make up only a small proportion of the lan-
guage that is translated every day, but it is important both culturally and linguisti-
cally for a translator to render it in an appropriate manner. If the innovative and
marked can be compared to related, unmarked forms in the SL, then the transla-
tor’s job is facilitated greatly. By considering conventional language as largely del-
exicalised, and innovative language as being a combination of a delexical support
and a contextually relevant addition, it is possible to go about achieving the same
effect in the TL by adhering to the same principles, essentially re-creating the TL
text in the same way as the SL text was constructed. In order to do this, however,
the translator must have access to a large quantity of data from which to identify
language norms, and that data comes in the form of general reference corpora.
Smaller corpora are simply inadequate when it comes to dealing with stretches
of text, fixed and semi-fixed phrases, and less-frequently used words and expres-
sions, though they serve a fundamental role in the identification of genre-related
phenomena.
6. Discussion
The use of comparable general reference corpora as an aid to the translation proc-
ess is often one-sided. TL corpora are often used as a control to ensure that the
translation produced sounds natural, but less use is made of corpora in assessing
the naturalness of the SL original. While it must be acknowledged that absolute
72 Gill Philip
equivalence remains an elusive and rare phenomenon, adopting a data-assisted

approach facilitates the identification of patterns and preferences across languag-
es, making it possible to match up SL and TL expressions that are functionally as
well as formally similar.
Choice in translation is related to choice in the SL, and this can be identified
by comparing a given expression against its possible alternatives along the para-
digmatic axis. In this way the translator obtains a more instant and detailed im-
pression of the meaning being conveyed, and if an equivalent paradigm of choice
is set up for the TL, the most suitable correspondences can be identified and used
in the translated text.
Delexicalisation is fundamentally important as a concept when translating
both conventional and unconventional language. There is a distinct difference be-
tween the meaning values of words in conventionalized utterances and their val-
ues in non-standard uses of the language, and this should be acknowledged and
acted upon when translating text. Corpus data highlights the conventional and
recurrent, and the rarity of unusual structures stands out. An awareness of the
norms underlying non-standard and unconventional language makes it possible
for the translator to recreate its effect in a structured and systematic way, rather
than rely on the “intuition and hunch, inspiration and even flashes of genius”
(Firth 1968: 85) that seem integral to the translation process.
Corpus tools are becoming increasingly sophisticated, and word profiling is
therefore a much more straightforward matter than it was a few years ago. By
teaching trainee translators how and when to use these tools their sensitivity to
language patternings will be heightened, and their translations will improve in
accuracy and fluency. A combination of automatic processing, manual analysis
and greater awareness of how languages make meaning, will give translators the
chance to have equivalence at their fingertips.
References
Aston, G. and Piccioni, L. 2004. “Un grande corpus di italiano giornalistico.” In Atti del conve-
gno nazionale AitLA, G. Bernini, G. Ferrari and M. Pavesi (eds). Perugia: Guerra. Available
from http://www.sslmit.unibo.it/~guy/aitla_repubblica.htm (accessed 25 March 2008).
Baker, M. 1992. In Other Words: A Coursebook on Translation. London/New York: Routledge.
Berlin, B. and Kay, P. 1969. Basic Color Terms: Their Universality and Evolution. Berkeley: Uni-
versity of California Press.
Biber, D. 1993.“Representativeness in Corpus Design.” Literary and Linguistic Computing 8(4):
243–257.
Firth, J. R. 1968 “A Synopsis of Linguistic Theory, 1930-5.” In Selected papers of J.R. Firth 1952–
1957, F. R. Palmer (ed.), 168–205. London/Harlow: Longmans.
Glucksberg, S. and Keysar, B. 1993. How metaphors work. In Metaphor and Thought (2nd and
revised edition), A. Ortony (ed.), 401–424. Cambridge: Cambridge University Press.
Halliday, M. A. K. 1992. “Language Theory and Translation Practice.” Rivista internazionale di
tecnica della traduzione 0 (pilot issue): 15–25.
Kenny, D. 2001. Lexis and Creativity in Translation. A Corpus-based Study. Manchester: St.
Jerome.
Kilgarriff, A. and Tugwell, D. 2002. “Sketching words.” In Lexicography and Natural Language
Processing: A Festschrift in Honour of B. T. S. Atkins, Marie-Hélène Corréard (ed.), 125–
137. Göteborg: EURALEX.
Kilgarriff, A., Rychly, P., Smrz, P. and Tugwell, D. 2004. “The Sketch Engine.” In Proceedings of
the Eleventh EURALEX International Congress, 105–116. Lorient: Université de Bretagne-
Sud.
Krishnamurthy, R. 2000. “Collocation: From silly ass to lexical sets.” In Words in Context: A
Tribute to John Sinclair on his Retirement, C. Heffer and H. Sauntson (eds), 31–47. Bir-
mingham: The University of Birmingham.
Laviosa, Sara. 1997. “How Comparable Can ‘Comparable Corpora’ Be?” Target 9(2): 289–319.
Louw, W. E. 2000. “Some implications of progressive delexicalisation and semantic prosodies
for Hallidayan metaphorical modes of expression and Lakoffian ‘Metaphors we Live By’.”
Privately-distributed version of “Progressive delexicalization and semantic prosodies as
early empirical indicators of the death of metaphors”. Paper read at the 11th Euro-Inter-
national Systemic Functional Workshop: Metaphor in systemic functional perspectives,
University of Gent (Belgium), 14–17 July 1999.
Moon, R. 1998. Fixed Expressions and Idioms in English: A Corpus-Based Approach. Oxford:
Clarendon.
Niemeier, S. 1998. “Colourless green ideas metonymise furiously”. Rockstocker Beträge zur
Sprachwissenschaft 5, 119–146.
Philip, G. 2003. Collocation and Connotation: A corpus-based investigation of colour words in
English and Italian. PhD thesis. The University of Birmingham, UK. Available from http://
amsacta.cib.unibo.it/archive/00002266 (accessed 25 March 2008)
Ragazzini, G. (ed.) 1995. Il Ragazzini: Dizionario inglese italiano – italiano inglese (3rd edition).
Bologna: Zanichelli.
Rossini Favretti, R. 2000. “Progettazione e costruzione di un corpus di italiano scritto: CORIS/
CODIS.” In Linguistica e informatica: Corpora, multimedialità e percorsi di apprendimento,
R. Rossini Favretti (ed.), 39–56. Rome: Bulzoni.
Siepmann, D. 2005. “Collocation, colligation and encoding dictionaries. Part 1: Lexicological
aspects.” International Journal of Lexicography 18(4): 409–443.
Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: OUP.
Sinclair, J. M. 1996. “The Search for Units of Meaning.” TEXTUS 9(1), 75–106.
Tognini Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam and Philadelphia: John Ben-
jamins.
Tognini Bonelli, E. 2002. “Functionally complete units of meaning across English and Italian:
Towards a corpus-driven approach.” In Lexis in Contrast, B. Altenberg and S. Granger
(eds), 73–95. Amsterdam and Philadelphia: John Benjamins.
Váradi, T. and Kiss, G. 2001. “Equivalence and Non-equivalence in Parallel Corpora.” Interna-
tional Journal of Corpus Linguistics 6 (special issue): 167–177.
Virtual corpora as documentation resources:
Translating travel insurance documents
(English-Spanish)*
Gloria Corpas Pastor and Miriam Seghiri

Universidad de Málaga (Spain)
The inclusion of documentation as a core subject in the curriculum of Transla-

tion and Interpretation degrees clearly underlines its importance to translators.
Training in this discipline is considered essential for a translator given that only
sufficient and conscientious work on documentation will allow an adequate
translation of a specialised text. The sources of information that may be utilised
by the translator are extremely varied, ranging from an oral consultation with
an expert to a search using specialised glossaries and dictionaries. However, in
the field of translation perhaps the most relevant documentation activity today
involves the use of the Internet and, closely related to this, the compilation and
management of virtual corpora.
In this chapter, we present a systematic methodology for corpus compilation
based on electronic resources available on the Internet. The methodology is il-
lustrated through the creation of a virtual corpus of travel insurance in English
and Spanish, whose representativeness is subsequently determined by using a
computer programme-called ReCor specifically designed for this purpose. Fi-
nally, some specific examples of possible uses in direct and inverse translations
of this type of document are given.
Key words: Corpus compilation and representativeness, specialized corpora,

legal translation
* The research reported in this paper has been carried out in the framework of the R&D
projects BFF2003-04616 (Spanish Ministry of Science and Technology/EU ERDF, 2003–2006)
and HUM-892 (Andalusian Ministry of Education, Science and Technology, 2006–2009).
76 Gloria Corpas Pastor and Miriam Seghiri
1. Introduction
Since the tourist industry is one of the principle driving forces behind the Spanish
economy,1 it is hardly surprising that there is a large demand for translations of

insurance policies in the tourism sector both from Spanish into English and from
English into Spanish (cf. ACT 2005). Although this economic reality could be
transitory, the rights of European consumers to demand translations of this type
of document under the auspices of European directives2 on insurance matters

and their respective national transpositions3 should also be taken into account.

These directives recognise the right of the party taking out insurance to receive a
contract4 written not only in the official language of the member state where the

agreement is made, but also in a language which they may specify. Subsequent
directives, such as 2002/92/CE,5 have also increased demand for translations of all

the formal documents that constitute the contract. In the following pages, we shall
1. Tourism is responsible for a huge volume of business in the international economy with
Europe occupying a privileged position at the top of the world scale. In 2006 Europe generated
$6,466.2 billion in this sector, equivalent to 10.3% of the world’s gross domestic product (GDP),
forecast to rise to 11% by 2011, accounting for 8.7% of total employment (cf. WTTC 2006a).
Also see studies by the WTTC concerning the United Kingdom (2006b), Ireland (2006c) and
Spain (2006d) for a more detailed analysis of the figures for these countries in this sector.
2. We refer to the Third EC Directive on Non-Life Insurance (92/49/EEC) and the Third EC
Directive on Life Assurance (92/96/EEC).
3. These transpositions, which are primarily aimed at consumer protection and fostering lin-
guistic plurality in Europe, are given expression, in the case of Spain, in the Ley 18/1997, de
13 de mayo, de modificaciones del artículo 8 de la Ley de Contrato de Seguro, para garantizar la
plena utilización de todas las lenguas oficiales en la redacción de los contratos, (BOE, 14th May
1997); in the case of the United Kingdom, in Statutory Instrument 2004, n.º 353. Insurers (Reor-
ganisation and Winding Up) Regulations 2004; and, finally, in the case of the Republic of Ireland,
in the Insurance Act 2000.
4. The policy (póliza, in Spanish) is the document which gives physical form to the insurance
contract. In addition, it is where the obligations and rights of both the insurer and the insured
person are set out, where the persons or objects that are insured are defined and the guarantees
and compensation in the case of damage are established. It also represents the formalisation
and culmination of the whole process of contracting the insurance. As a result, in many cases
the insurance policy may be referred to as the contrato (contract) (cf. Ley 50/1980; Insurance Act
2000; The Financial Services and Markets Act 2000).
5. We refer specifically to Directive 2002/92/EC of the European Parliament and of the Council
of 9 December 2002 on insurance mediation. In Article 13 of this directive, under “Information
conditions”, it is specified that “All information to be provided to customers in accordance with
Article 12 shall be communicated: (a) on paper or on any other durable medium available and
accessible to the customer; (b) in a clear and accurate manner, comprehensible to the customer;
Virtual corpora as documentation resources 77
present a systematic methodology for the creation of a virtual corpus of travel

insurance in English and Spanish based on electronic resources available on the
Internet. The representativeness of this corpus will subsequently be determined
by using a computer programme specifically designed for this purpose.
2. Corpora in translation training
The advantages of using corpora in translation have been shown by various

studies (cf. Laviosa 1998; Bowker 2002; Bowker and Pearson 2002; Zanettin et al.
2003, amongst others). Some of the principal advantages of using them are their
objectivity, their reusability and multiple usage of a single resource. In addition,
they are user-friendly and allow access to and management of huge quantities of
information in almost no time. Furthermore, we must consider that the develop-
ment of our current information society has brought about a demand that did
not exist previously for texts written in a variety of languages. Together with eco-
nomic globalisation, this has resulted in a growing interest6 in the use of bilingual

and multilingual corpora by researchers working in the fields of automatic and

assisted translation, language teaching, terminology and specialised language,
natural language processing and information recovery as well as, more recently,
in training and documentation as applied to translation.
On this last subject, despite the remit of the European project LETRAC7 (Lan-
guage Engineering for Translators Curricula), the use of corpora has only really
come to the attention of researchers working in the field of translation training
relatively recently. Examples of studies that stand out are: Kenny (2001) on the
subject of literary translation based on parallel corpora in German and English;
(c) in an official language of the Member State of the commitment or in any other language
agreed by the parties.”
6. There has been such a flood of compilers in Europe that we are forced to list only some of
the more important examples: ACL (Association for Computational Linguistics); ECI (European
Corpus Initiative); LDC (Linguistic Data Consortium); ICAME (International Computer Archive
of Modern and Medieval English); ACL/DCI (Association for Computational Linguistics Data
Collection Initiative) and ELRA (European Language Resources Association).
7. See <http://www.iai.uni-sb.de/docs/D3.pdf>. In their final report, which was presented to
the European Commission DG XII, the LETRAC project stressed the importance of introduc-
ing the following elements to the curriculum of translation degrees: applied IT, terminology
management programmes, CAT and AT systems, ICTs and linguistic engineering as well as
leaving time for publishing programmes, the Internet, controlled languages, project manage-
ment, translation memories and corpus linguistics.
Corpas Pastor (2001, 2003b, 2004a, b and c) on legal and medical translations
based on multilingual corpora compiled from the Internet; and Sánchez-Gijón
(2003a: NP) on the subject of virtual ad hoc corpora for scientific translations in
the English-Spanish language pair. Other examples of studies are: Bernardini and
Zanettin (2000); Bowker and Pearson (2002); Zanettin, Bernardini and Stewart
(2003) on the possibilities offered by corpora for specialised language teaching.
Two studies that deal with the potential use of corpora in language teaching, natu-
ral language processing and translation are Aston (2001) and Granger and Petch-
Tyson (2003). Finally, in the R&D project described in Corpas Pastor (2003a) the
corpus was used as a fundamental documentation resource for the translation of
legal texts – this new venue of research was further developed some years later by
Seghiri (2006).
Both researchers and teachers are in agreement over the importance of
corpora in translation training and practice. Some authors have gone even fur-
ther and specifically indicate virtual corpora (cf. Pearson 1998; Bernardini and
Zanettin 2000; Corpas Pastor 2001 and 2004a; Zanettin 2002a and b; Sánchez-
Gijón 2003a and b) as one of the translator’s most important aids when faced with
a specialised text. By virtual corpus we refer to a corpus compiled from electronic
sources exclusively in order to carry out a specific translation in any direction (di-
rect, inverse or indirect8). Its principal objective is to construct a reliable resource

quickly and at minimal cost, based on texts mined from the Internet, to satisfy the
translator’s documentation needs.
Virtual corpora may also be referred to as ad hoc (Corpas Pastor 2001: 164;
Sánchez-Gijón 2003a: 3), disposable (Zanettin 2002a), do-it-yourself/DIY (Zanettin
2002a), domain-specific (Corpas Pastor 2004a: 226), web (Fletcher 2004), electron-
ic (Corpas Pastor 2001; Varantola 2003), ephemeral (Corpas Pastor 2004a: 226),
precision (Varantola 1997); and special purpose (Jennifer Pearson 1998; Sánchez-
Gijón 2003a).
Translators turn to the Internet in search of solutions to information and doc-
umentation problems because they are not only translating between languages
(for which a good dictionary, whether online or not, would suffice), but also
between discourse communities or cultures. In this context, the compilation of
corpora and the Internet appear to be two of the most important documentation
resources in the practice and research of specialised translation. When facing this
8. A “direct translation” is translation done directly from the original into translator’s na-
tive language, without an intermediary text; an “inverse translation”, also called “other tongue
translation (OTT)”, is a translation from the translator’s native language into another language;
finally, an “indirect translation”, also denominated “mediated translation”, is a translation done
via an intermediary translation in a third language, not directly from the original.
kind of assignment, the main problem that translators come up against is that a
corpus for the particular speciality is not available for consultation on the Internet
or, if one already exists, it often does not cover all the information requirements of
the source text. In other words, “one problem with these typically small and do-
main specific corpora is the limited range of topics and text types for which they
are available” (Zanettin 2002a: NP). Faced with this situation, translators have no
alternative other than to compile their own virtual corpora for the specific trans-
lation that has been commissioned in each case.
It is also important to take into account that any set of texts does not, in and
of itself, constitute a corpus. In order for a collection of texts to be considered a
corpus in the strict sense of the term, it must meet a set of clear design criteria
and abide by a specific compilation protocol so that the collection may be deemed
representative of the field of specialisation or the particular type of document that
is being translated.
3. Guidelines for corpus creation
In this section we will outline the design parameters that the creation of a virtual
corpus demands. Following this we will propose a compilation protocol in the
form of guidelines. This consists of four distinct phases: (1) locating and accessing
resources, (2) downloading data (3) text formatting and (4) data storage.
3.1 Design criteria
Before moving on to deal specifically with how the documentation resources

necessary to create a virtual corpus are located, it is essential for the translator-
compiler to first of all establish a set of clear design criteria. In this case, the ob-
jective is to create a corpus of travel insurance policies in Spanish and English
compiled exclusively from tourism law resources available on the Internet. This
bilingual corpus must be diatopically restricted due to the large number of coun-
tries in which both English and Spanish are official languages. In order to illus-
trate the methodology put forward, the corpus will be restricted to legislation in
force (whether it be communitary, national or from autonomous authorities) and
to the formal elements of the contract (principally insurance quotes, proposal
forms, certificates of insurance and insurance policies9) that have been drawn
10
9. Another document is the duplicado de la póliza (a duplicate of the policy), which is drawn
up in writing by the insurer if requested by the person who takes out the insurance, the insured
up in Spain, the Republic of Ireland and the United Kingdom (Scotland, Wales,
England and Northern Ireland). In addition, it will be necessary to compile a
comparable corpus, made up of two subcorpora, one in Spanish and the other in
English, which will include the original texts of the tourism contracts. This will
be a textual corpus, i.e. a full-text corpus, since it will include complete texts, and
a specialised corpus, in the sense that it includes specific text types dealing with
communication between specialists and semi-specialists or laymen.
A travel insurance corpus compiled in accordance with these design criteria
will be essentially unbalanced,10 since quality takes priority over quantity (Corpas
11
Pastor 2004a: 236) in this type of virtual corpus which has been compiled ad hoc.
It is, however, extremely homogenous given that it has been created for a specific
purpose.
3.2 Compilation protocol
Once the preliminary design parameters have been established the translator-
compiler should follow a protocol for the creation of the corpus comprising four
stages which will now be described.
3.2.1 Locating and accessing resources

The first stage of the protocol consists of locating and accessing information avail-
able on the Internet. In order to do this the translator-compiler will have to de-
velop and/or put his/her knowledge of electronic resources into practice.
Once the type of electronic corpus has been designed the question of access
to the relevant documents arises. Various possibilities exist for accessing these
texts. According to Austermühl (2001: 52 et seq.), there are basically three types
of searches that may be carried out on the Internet: institutional searches, car-
ried out on the web sites of international organisations and institutions; thematic
searches, normally carried out using directories and, lastly, key word searches us-
ing a search engine.
person or the beneficiary. The insurer is obliged to provide a duplicate or copy of the policy if
the original is mislaid, the copy must be identical and have the same validity as the original. In
addition, there is also a document known as the boletín de adhesión (a joining form), a docu-
ment which gives proof of the insurance and has not been included here because it only applies
to life insurance policies.
10. Unlabaced because of the distribution of languages on the Internet. According to the “Top
Ten Languages Used in the Web (November 2007)” published by Internet World Stats (http://
www.internetworldstats.com/stats7.htm), the Spanish language represents 9.0 % of all the In-
ternet users in the world, while English represents 30.1 %.
We shall begin with an institutional search,11 one of the most productive types
12
of search for constructing corpora. This is due not only to the great quantity of
documents that these types of institutions, organisations or associations store on
the Internet today, but also because they can be assumed to be of a high standard
in terms of both quality and reliability because the writers are specialists in the
field. This institutional search will be mainly, though not exclusively, carried out
from institutional, regulatory and legislative sources. In order to locate legislation
the web sites and web pages that follow may be used.
In terms of official organisms and institutions, legislative information can be
taken from the headquarters of the ABI (Association of British Insurers),12 the 13
ABTA (Association of British Travel Agents)13 or the FSA (Financial Services Au-
14
thority)14 for the United Kingdom and Ireland. For Spain, information can be
15
mined from the Mesa del Turismo,15 particularly the section called “legislación
16
general” which includes regulatory laws and laws specifically related to the tour-
ism sector.
Another outstanding web site is that of the WTO (World Tourism Organisa-
tion)16 which contains one of the principal documentation resources for legisla-
17
tive material, Lextour.17 This is the WTO’s database of tourism legislation which
18
has links to web sites, databases, and external servers concerned with tourism
legislation set up by parliaments, governmental organisations, universities and
professional associations. We have also taken information from other databases
to obtain communitary legislation, such as the well respected Westlaw.18 However,
19
11. On numerous occasions, it may be necessary to perform a key word search to find the
names of more organisations to be used in the institutional search. This can usually be per-
formed by introducing descriptors together with Boolean techniques in a search engine such
as Google. For example, introducing organismo OR turismo, organismo AND turismo OR “or-
ganismo turístico” will increase the number of names of organisations connected with tourism,
whose web sites can then be visited in order to extract information that may be suitable for
inclusion in the travel insurance corpus.
12. Available at <http://www.abi.org.uk>.
13. Available at <http://www.abta.com>.
14. Available at <http://www.fsa.gov.uk/consumer>.
15. Available at <http://www.mesadelturismo.com>.
16. Available at <http://www.world-tourism.org>.
17. Available at <http://www.world-tourism.org/doc/S/lextour.htm>.
18. Available at <http://web2.westlaw.com/signon/default.wl?bhcp=1>.
our most significant source has been EUR-Lex,19 the portal to European Union
20
law, which is currently the best database for European Union law.
Practically all the documents involved in the process of making a contract for
travel insurance may be found on the web sites of the big insurance companies. In
addition, although less frequently, the web sites of numerous online travel agen-
cies contain the texts of their policies, which they sell on from various insurance
companies, for their customers’ information. Similar rich sources of information
are also the web sites of international insurance companies such as Mondial As-
sistance20 or Europ Assistance,21 British and Irish insurance companies such as AT
21 22
Bell Insurance Brokers Ltd,22 Royal and Sun Alliance23 or Lloyds of London;24 or
23 24 25
Spanish insurance companies, such as Allianz,25 MAPFRE26 or Ocaso,27 to men-

26 27 28
tion only a few of the most representative examples.

The next step is to move on to making thematic searches28 using well known
29
directories. In this case, a problem with locating information may arise as a result
of the structure of the directories themselves which can even hinder the process
of documentation extraction.
Specialist directories stand out as excellent resources for locating commu-
nitary, national and autonomous legislation, especially when the resources they
contain are also evaluated and commented upon. This is the case for the compila-
tion of the Spanish subcorpus, using the section called “Dret” in the “Indices” of
19. Available at <http://eur-lex.europa.eu>.

20. Available at <http://www.mondial-assistance.com/en/aboutus/homepage.htm>.
21. Available at <http://www.europ-assistance.es/>.
22. Available at <http://www.atbell.co.uk>.
23. Available at <http://www.royalsunalliance.com/royalsun>.
24. Available at <http://www.lloyds.com>.
25. Available at <http://www.allianz.es>.
26. Available at <http://www.mapfre.com/pmapfre/es/index.html>.
27. Available at <http://www.ocaso.es>.
28. As with the institutional search, the thematic search may be complemented by a key word
search if it is necessary to augment the names of thematic directories connected to the par-
ticular specialisation that is being searched. For example, to locate legal directories we would
normally go to Google and by using descriptors combined with Boolean operators introduce
productive search equations such as “directorio jurídico” or directorio AND jurídico.
the Universitat de Barcelona29 and the Universitat Autònoma de Barcelona.30 The

30 31
directories of The Argus Clearinghouse31 and Search the Law32 (particularly the
32 33
section “Travel”) are similarly useful for the English subcorpus.

In general, thematic searches based on indices or directories are the most pro-
ductive for extracting legislation rather than insurance contracts. In order to do
this it is necessary to take a further step and carry out a key word search. For this
type of search a generic search engine such as Google may be used. According to
a great number of analysts Google is the best search engine in terms of the quality
of search results (cf. Radev et al. 2005: 580).
Alongside visits to insurance companies’ web sites, key word searches have
proved to be (cf. Seghiri 2006) the easiest and quickest way to recover the docu-
ments that make up insurance contracts. The best results will be obtained from
search engines if knowledge of the facilities they offer is utilised. As well as defin-
ing the search appropriately, techniques such as using Boolean operators, trunca-
tion and phrase searches should be considered. On this point, it is clearly essential
to establish descriptors. A practical example (cf. Tables 1 and 233) is given to il-
34
lustrate how searches are made to locate the texts that will comprise the corpus.
In order to do this, the text types and the field of insurance in which the desired
information is to be found (travel insurance) are taken as descriptors and Boolean
search techniques are applied using the user friendly interface offered by, for in-
stance, Google’s advanced search.34 35
29. Available at <http://www.bib.ub.es/bub/internet.htm>.

30. Available at <http://www.bib.uab.es/internet.htm>.
31. Available at <http://www.clearinghouse.net>.
32. Available at <http://www.search-the-law.com>.
33. In this table only the descriptors that have produced the greatest number of documents for
the text type we required in the two specific languages (English and Spanish) are shown. How-
ever, it should be pointed out that in reality a vast number of search criteria were used and here
we have only shown a sample by way of illustration.
34. In order to mine the Spanish contractual documents, the version of Google for Spain
(<http://www.google.es>) was used. By selecting the option “páginas de España” it is possible
to filter out any documents that come from other Spanish speaking countries. The same pro-
cedure may be followed to search for information in English, i.e. the user goes to the version
of Google for the United Kingdom (<http://www.google.co.uk>) and for Ireland (<http://www.
google.ie>) and selects the options “pages from the UK” and “pages from Ireland” respectively
in order to avoid the presence of documents that come from other countries. Occasionally,
however, this filtering will not be sufficient so that, in addition to searching by country, it may
be necessary in cases of doubt as to the origin of a document located by using Google, to refer
to the domain in order to verify their source. The knowledge that the domains .es for Spain, .uk
Table 1. Descriptors for the finding of the formal elements of travel insurance contracts
(Spanish).
Text type Descriptors Search equation
Póliza Póliza, seguro turístico, póliza AND “seguro turístico”
asistencia en viaje35 póliza AND “asistencia en viaje”
Solicitud Solicitud de póliza, seguro solicitud AND póliza AND “seguro turístico”
turístico, asistencia en viajeSolicitud AND póliza AND “asistencia en
viaje”
Propuesta Propuesta, proposición, póliza AND propuesta OR proposición “se-
seguro turístico, asistencia guro turístico”
en viaje póliza AND propuesta OR proposición “asis-
tencia en viajes”
Carta de Garantía Carta de garantía, seguro “carta de garantía” AND “asistencia en viaje”
turístico, asistencia en viaje “carta de garantía” AND “seguro turístico”
Table 2. Descriptors for the finding of the formal elements of travel insurance contracts
(English)
Text type Descriptors Search equation
Policy Policy, travel insurance policy AND “travel insurance”
Quote Quote, travel insurance Quote AND policy AND “travel insurance”
Proposal Form Proposal Form, travel insurance “proposal form” AND policy AND “travel
insurance”
Certificate of Certificate of Insurance, “certificate of insurance OR
Insurance Insurance Certificate, travel “insurance certificate” AND policy
insurance
for the United Kingdom and .ie for Ireland will therefore be of use. In addition pages in Spanish
with the domain .ar indicating Argentina, or .mx indicating Mexico and pages in English with
the domain .au indicating Australia or .us indicating the United States will be automatically
ruled out because they are not appropriate for our corpus.
35. We refer mainly to seguro turístico or travel insurance in accordance with the position
taken by Aurioles (cf. Aurioles Martín (2005 [2002]) y and Aurioles Martín et al. (2004) be-
cause we believe it to more accurate than the Spanish calque, asistencia en viaje of the original
English, since travel assistance is only one possible part of travel insurance which may also
include coverage for holiday cancellation or medical attention, to cite only some of the most
common examples. For a wider perspective on this question see the trilingual (Spanish-Eng-
lish-Italian) classification of travel insurance policies in relation to coverage outlined by Seg-
hiri (2006: 279–281).
The main difficulty with key word searches centres on the choice of the most pre-
cise descriptors for the intended search, given that without this a large amount of
irrelevant information will be returned. It is up to the translator-compiler to filter
out all this “noise” from each of the pages that will be included in the corpus.
3.2.2 Downloading data

When the documents have been located and accessed, the next stage is to down-
load the data. Usually, this stage is performed manually, although occasionally it
is possible to automate the task when dealing with a group of web pages which
have been accessed using the programme GNU Wget,36 which allows download-
36
ing in batches.
This downloading phase may be hampered by the inherent structure of the
Internet itself. On the one hand, we are faced with a mark-up language or HTML,
in other words, the information is organised in hypertext nodes which are often
difficult to access. This is usually as a result of the content being inappropriately
labelled or because the location of the information is difficult to see on the page.
On the other hand, the wide variety of formats that the information may appear
in should also now be considered.
3.2.3 Text formatting

In the cases of both legislation and contracts related to travel insurance a notice-
able predilection for HTML (.html) and PDF (.pdf) exists. The first of these does
not involve many problems in terms of conversion since the information may
simply be copied and pasted into a text document. Google will also allow the ma-
jority of PDF documents to be seen in .html format, thereby permitting the same
procedure to be carried out. When this is not possible, conversion programmes
such as Solid Converter37 may be used. Hence, this third stage of downloading is
37
completed by what might be called normalisation, since all the documents will
be converted to an ASCII or plain text format. In other words, they are stripped
36. This free software together with its instruction manual may be downloaded from the fol-
lowing web site: <http://www.gnu.org/software/wget/>.
37. A trial version of Solid Converter may be downloaded free of charge from <http://www.
solidpdf.com>. Given that it is a free trial version, it has a number of limitations: it only func-
tions for a two week period and permits conversion of a maximum of ten pages per document,
although it is possible to convert a complete text over a number of operations by specifying a
different set of pages each time. There are other free programs available online like Pdf to Word
converter 3.0 (<http://www.geomundos.com/descargas/bajar-pdf-to-word-converter-30_233.
html>), PDF Converter (<http://www.freepdfconvert.com/convert_pdf_to_source.asp>) or
Easy PDF to Word Converter (<http://www.pdf-to-html-word.com/ >), for instance.
of the HTML or code of any other kind, in accordance with the clean-text policy
described by Sinclair (1991: 21).
3.2.4 Data storage

The last stage is to store the data. This consists of storing the documents that have
been downloaded and correctly identifying and arranging them. One possible
way of doing this is through the use of sub-files depending on whether the docu-
ments are in their original format or in ASCII format. These sub-files are then
subdivided according to the language, text types and text formats of the corpus.
In this study, we have extracted two subcorpora from the multi-lingual Tu-
ricor corpus of travel and tourism law, which is described and fully documented
at the website http://turicor.com. The two subcorpora are a bilingual comparable
corpus which consists of a Spanish subcorpus with 259 texts38 (1,837,869 words)
38
and an English subcorpus with 302 documents (3,202,118 words).
4. Determining corpus representativeness
Despite repeated reference by the experts to the quality of being “representative”,

constituting a “sample” and so forth as distinguishing features of corpora as op-
posed to other kinds of textual collections, there appears to be no consensus on
this crucial issue.
The size of the corpus is a decisive factor in determining whether the sample
is representative in relation to the needs of the research project (cf. Lavid 2005).
38. On the subject of the legislative documents that form part of the corpus (17 texts in English
and 2 texts in Spanish) it is important to point out that travel insurance is not regulated by
substantive legislation. Instead it comes under the regulations that apply to all insurance other
than life insurance through various communitary directives such as 73/239/EEC, 73/240/EEC,
76/580/EEC, 78/473/ EEC, 84/641/ EEC, 87/343/ EEC, 87/344/ EEC, 88/357/EEC, 90/618/EEC,
92/49/EEC, 95/26/EEC, 2000/26/EC, 2000/64/EC and 2002/13/EC. In Spain, travel insurance
contracts are also currently regulated by the Ley 50/1980, de 8 de octubre, de Contrato de Seguro,
[Act 50/1980, 8th October, Insurance Contracts] as well as the Ley 30/1995, de 8 de noviembre,
de ordenación y supervisión de los Seguros Privados [Act 30/1995, 8th November, Planning and
Supervision of Private Insurance]. In Ireland, insurance contracts are regulated by the Insurance
Act, 2000, as well as the European Communities (Non-Life Insurance) Framework Regulations,
1994 (S.I. No. 359 of 1994). In the United Kingdom, they are regulated by the Financial Serv-
ices and Markets Act 2000 (Statutory Instrument 2003 N.º 1476), specifically Amendment, Nº.
2, Order 2003. In relation to policies, the central document in this type of agreement, it was
possible to include 101 documents (1,000,067 words) in the Spanish policies component and
176 documents (1,903,661 words) in the policies component in English. The remainder of the
formal elements of the contract are included in the rest of the corpus.
However, even today the concept of representativeness is still surprisingly im-

precise considering its acceptance as a central characteristic that distinguishes
a corpus from any other kind of collection.39 As Biber, who is one of the most
39
prolific writers on the subject of corpus representativeness, emphasises, “a corpus

is not simply a collection of texts. Rather, a corpus seeks to represent a language
or some part of a language” (Biber et al. 1998: 246). Nevertheless, at the same time
Biber remains conscious of the difficulties involved in compiling a corpus that
could be defined as “representative” (cf. Biber et al. 1998: 246–247).
It is therefore commonplace to come up against questions over the minimum
number of texts needed to guarantee that a sample is scientifically valid, as well as
debates over how to specify a sufficient number of texts and number of words for
a corpus (Sanahuja and Silva 2001).
There have been many attempts to set the size, or at least establish a minimum
number of texts, from which a specialised corpus may be compiled. Some of the
most important are those put forward by Heaps (1978),40 Young-Mi (1995) and
40
Sánchez Pérez and Cantos Gómez (1997). However, subsequently, some of these
authors, such as Cantos (Yang et al. 2000: 21), recognised some shortcomings in
these works, suggesting that they might be attributed to the use of Zipf ’s law.41 41
Zipf ’s law42 can give us an idea of the breadth of vocabulary used, but it is not
42
limited to a particular or approximate number because this will depend on how

the constant is determined (Braun 2005 [1996] and Carrasco Jiménez 2003: 3).
39. There are a surprising number of research projects that, whilst endeavouring to compile a
“representative” corpus, hardly seem to touch on this concept. Usually, it is noticeable that the
availability of material in the particular field of study determines the final size of the corpus
(Giouli y Piperidis 2002).
40. Indeed, out of this work came the rule known as Heaps’ law. Both Zipf ’s and Heaps’ laws
are used to grasp the variability of corpora: Heaps’ law is an empirical law which examines the
relationship between vocabulary size, or in other words, the number of different words (types)
and the total number of words in a text (tokens). In this way a sequential increase of vocabulary
in relation to text type can be observed. The programme ReCor has been validated using this
law (cf. Seghiri 2006: 399–403).
41. Conscious of these deficiencies, Yang et al. (2000) attempted to overcome them by taking
a new approach: a mathematical tool capable of predicting the relationship between linguistic
elements in a text (types) and the size of the corpus (tokens). However, at the end of their study,
the authors reflected on some of its limitations, “the critical problem is, however, how to deter-
mine the value of tolerance error for positive predictions” (Yang et al. 2000: 30).
42. For a historical perspective on how Zipf ’s law was developed see Moreiro González
(2002).
Numerous studies have been based on the law, but the conclusions they reach
do not specify, not even through the use of graphs, the number of texts that are
necessary to compile a corpus for a particular specialised field (Almahano Güeto
2002: 281).
A possible solution could be to analyse the lexical density of a corpus in rela-
tion to the increase in documentary material included. In other words, if the ratio
between the actual number of different words in a text and the total number of
words (types/tokens) is an indicator of lexical density or richness, it may be pos-
sible to create a formula that can represent lexical density as the corpus increases
on a document by document basis: once a certain number of texts have been
included, the number of types does not increase in proportion to the number of
words the corpus contains.
This formula may make it possible to determine the minimum size that a
corpus must reach for it to begin to be representative. With the help of graphs,
it should be possible to establish whether the corpus is representative and ap-
proximately how many documents are necessary to achieve this. This theory has
become a practical reality in the shape of a software application, ReCor,43 which 43
enables accurate evaluation of corpus representativeness.

It should be made clear that the method for evaluating the homogeneity of
a very specialised corpus assumes that the target population is known and avail-
able to the researcher. This clearly involves careful design of the corpus in terms
of components, text types to be included, diasystematic limits (diaphasic, di-
astratic, diachronic and diatopic), as well as type of corpus (comparable, parallel,
etc.), number and status of languages, text documentation for DTDs and head-
ers, inter alia.
Once the question of quality is ensured in terms of corpus design and docu-
ment selection, this programme can be used to determine a posteriori whether the
size reached by a given corpus is sufficiently representative of this particular sector
of the tourist industry. For further information, the technology and the theoretical
presuppositions behind the ReCor Programme are explained in detail in Seghiri
(2006), Corpas Pastor and Seghiri (2006a, 2006b, 2007a, 2007b and forthcoming).
4.1 The ReCor interface
ReCor’s interface is simple, intuitive and user-friendly (see Figure 1). Firstly, an in-
put file may be selected; this could be anything from a particular clause in a policy
43. ReCor is an acronym derived from the function it was designed for: the representativeness
of corpora.
Figure 1. The ReCor interface
to the entire corpus. There is also an option: “Filtro de entrada”, which filters out all
those words that the user wants to exclude from the analysis, like addresses, prop-
er names or even HTML tags, in the case that the corpus has not been “cleaned”.
Next, three output files are created. The first, “Análisis estadístico” or statistical
analysis, collates the results from two distinct analyses; firstly, with the files ordered
alphabetically by name and secondly with the files in random order. The docu-
ment that appears is structured into five columns which show the number of
types, the number of tokens, the ratio between the number of different words
and the total number of words (types/tokens), the number of words that appear
only once (V1) and the number of words that appear only twice (V2). The second
output file, “Palabras ord. alfa.”, generates two columns; the first shows the words
in alphabetical order with their corresponding number of occurrences appearing
in the second column. The same information is shown in the third file, “Palabras
ord. frec.”, but this time the words are ordered according to their frequency, or
in other words, by their rank. The application also allows the user to work with
groups of up to ten words (n-grams)44 and phraseology, as well as allowing num-
44
bers to be filtered out.
44. In this study we used the 2.1 version of ReCor. We are currently working on a new version
(ReCor 3.0) which has an improved capacity for working with multiple and very large files
quickly and also allows phraseological units to be identified on the basis of analysis of n-grams
(n ≥ 1 and n ≤ 10) of the corpus.
4.2 Graphical representation of data
The programme illustrates the level of representativeness of a corpus in a simple

graph form, which shows lines that grow exponentially at first and then stabilise
as they approach zero.45 45
In the first presentation of the corpus generated by the programme in graph

form – Estudio gráfico A – the number of files selected is shown on the horizontal
axis, while the vertical axis shows the type/token ratio. The results of two different
operations are shown, one with the files ordered alphabetically (the red line), and
the other with the files introduced at random (the blue line). In this way the pro-
gramme double-checks to verify that the order in which the texts are introduced
does not have repercussions on the representativeness of the corpus. Both op-
erations show an exponential decrease as the number of texts selected increases.
However, at the point where both the red and blue lines stabilise, it is possible to
state that the corpus is representative, and at precisely this point it is possible to
see approximately how many texts will produce this result.
At the same time another graph is generated – Estudio gráfico B – in which
the number of tokens is shown on the horizontal axis. This graph can be used to
determine the total number of words that should be set for the minimum size of
the collection.
Once these steps have been taken, it is possible to check whether the number
of travel insurance documents that have been assembled in the two languages in-
volved – English and Spanish – is sufficient to enable us to affirm that our corpus
is representative. See Figures 2 and 3 below which show the representativeness of
the two languages involved.
The results generated by ReCor allow us to conclude that the Spanish subcor-
pus of travel insurance (cf. Figure 2) can be considered representative from 140
documents and 1 million words onwards, whereas the English subcorpus needs
almost double the number of documents (275) and words (2.5 million) in order
to reach representativeness (cf. Figure 3). The results remain largely the same even
when the analysis is performed on a two-word basis (2-grams). In other words,
the English subcorpus of travel insurance (cf. Figure 5) must contain twice the to-
tal number of documents and tokens that are necessary for the Spanish subcorpus
to be deemed representative (cf. Figure 4).
45. It should be noted here that 0 (=zero) is unachievable because of the existence in the text of
variables that are impossible to control such as addresses, proper names or numbers, to name
only some of the more frequently encountered.
Figure 2. Representativeness of the Spanish travel insurance subcorpus (1-gram)
Figure 3. Representativeness of the English travel insurance subcorpus (1-gram)
Furthermore, the quantitative data produced by ReCor permits us to conclude

that, despite the absence of substantive legislation on insurance in the tourism
industry in either of the legal systems involved, Spanish travel insurance docu-
ments tend to be more homogenous than the English text forms. In other words,
it is possible to infer that that the Spanish documents present super-, macro- and
microstructures that are very similar to each other in addition to using a narrower
terminological range.
Figure 4. Representativeness of the Spanish travel insurance subcorpus (2-grams)
Figure 5. Representativeness of the English travel insurance subcorpus (2-grams)
5. Using the corpus to translate
A well-constructed virtual corpus facilitates diverse studies on translation as both

product and process. Furthermore, one of the most promising uses of corpora is
in translation teaching and learning to translate. Representative virtual corpora
provide translators (trainers, trainees and professionals) with a first-rate docu-

mentation resource for rendering source texts (STs) into the target language.
In addition, the compilation of a virtual corpus calls for a thorough under-
standing of electronic resources, search skills and data mining techniques from
the Internet, thereby promoting the development of the translator-compiler’s
heuristic sub-competence. Moreover, when a corpus has been appropriately
designed and implemented, we can assume that the compiler has carried out a
preliminary evaluation of information resources, in order to ensure the overall
quality of the textual collection. Evaluation and selection of the documents to be
included in a given corpus will usually speed up the translation and/or revision
process. As a result, translators can devote extra time to decision-making and
problem-solving and focus on these more demanding tasks, instead of repeat-
edly reviewing the reference material. Hence, using corpora as an aid may also
enhance potential users’ overall competence as translators.
5.1 Source text samples
Comparable corpora are particularly useful for meeting translators’ information

needs. In the following subsections we will illustrate the value of corpora for find-
ing information on terminology, phraseology, concepts and discourse for direct
and inverse translation of an extract from a travel insurance policy. In order to do
this, we have selected two extracts from travel insurance policies, one in English
and the other in Spanish as source text (ST) samples.
Extract 1 (ST):46 46
Important
This is your travel insurance policy. It contains details of cover,
conditions and exclusions relating to each insured person and is the
basis on which all claims will be settled.
46. The extract comes from a travel insurance policy from the British insurance company Direct
Travel Insurance: <http://www.direct-travel.co.uk/FAQ/Wordings/policywording010506.pdf>.
Extract 2 (ST):47 47
CONDICIONES GENERALES
Artículo Preliminar.-El Contrato de Seguro.-El presente Contrato
de Seguro se rige por lo dispuesto en la Ley 50/1980, de 8 de octubre,
de Contrato de seguro, en la Ley 30/1995, de 8 de Noviembre, de Or-
denación y Supervisión de los Seguros Privados.
5.2 Documentation needs
Even two short ST fragments like those chosen in 5.1 offer abundant evidence to
argue in favour of the use of comparable corpora in the actual translation process.
We are mainly concerned with the terminological and phraseological needs of
translators, the extraction of conceptual or domain information, and the com-
parison of textual and discourse features in the source and target languages.
5.2.1 Terminology and Phraseology

The first problem that a translator may come up against is how to translate the
term travel insurance policy (cf. Extract 1). On this point it should be noted that
the term seguro turístico has a long tradition in our legal system since the publi-
cation in 1964 of the Spanish Presidential Decree 3304/64 on insurance contracts
for foreign tourists. However, this all changed when the text of the Council Direc-
tive 84/641/EEC of 10 December 1984 amending, particularly as regards tourist as-
sistance, the First Directive (73/239/EEC) on the co-ordination of laws, regulations
and administrative provisions relating to the taking-up and pursuit of the business
of direct insurance other than life assurance was transposed to the Spanish legal
system through the Ministerial Order of 27 January 1988 which describes cover-
age of assistance while travelling as part of private insurance. This ministerial or-
der employed the term travel assistance which was translated into Spanish with
the officially accepted neological calque asistencia en viaje. Since then, this neo-
logical calque from international/Euro English has been incorporated into the
Spanish legal system and has supplanted the original seguro turístico, which is
much more correct given that travel assistance is only one possible part of travel
insurance coverage. Other aspects which may be covered include coverage for
47. The extract comes from a travel insurance policy from Agrupación Astes, Seguro Turístico
published on the web site of the travel agents, Condor Vacaciones S.A: <http://www.special-
tours.com/ficheros/Seguro_Europa_ES.pdf>.
cancellation of the holiday or medical assistance, to mention only some of the

most frequent.
The Spanish corpus also contains two synonyms for the term travel insur-
ance: seguro turístico and seguro de asistencia en viaje, although the frequency
with which they appear varies.
As may be seen, seguro turístico (cf. Figure 6) produces only 15 concordanc-
es,48 as compared with 26 for seguro de asistencia en viaje (cf. Figure 7). It should
48
be pointed out that asistencia en viaje appears 107 times. This clearly demon-
strates the preference in Spanish for the English calque when drawing up this type
of document as well as the influence of English as the lingua franca par excellence
(often referred to as “international legal English”) and its impact on legislation in
the field of travel insurance in peninsular Spanish.
Similar problems arise for translators when faced with translating El Contrato
de Seguro (cf. Extract 2) into English as there appears to be two possibilities: as-
surance contract or insurance contract. A search for contract in the corpus reveals
a preference in English for contract of insurance (cf. Figure 8). In addition, when it
appears in this particular position in the text, a fixed expression (This is your con-
tract of insurance) can be identified which should be reproduced in translation.
Figure 6. Concordances for ‘seguro turístico’
48. The analysis of concordances was carried out using WordSmith Tools 4.0.
Figure 7. Concordances for ‘seguro de asistencia en viaje’
Figure 8. Concordances for ‘contract’
The next problem that could arise for the translator is how to translate the
English cover, conditions and exclusions (cf. Extract 1) into Spanish. A search in
the Spanish corpus for the literal translation condiciones, coberturas y exclusiones
shows only one concordance. On this point it is important to remember that legal
Figure 9. Concordances for ‘exclusiones’
Figure 10. Concordances for ‘conditions’
language is characterised not only by its precision, but also by its formulaic and
extremely conservative style. The translator should be aware of the abundance of
verbose and often redundant phraseological units and other fixed expressions and
the archaic or conventional forms that these texts contain, often with the sole pur-
pose of making them appear more grandiose. Finally, the Spanish corpus revealed
that the term exclusiones is always found as part of the phraseological unit límites
y exclusiones (or, else, as garantías, límites y exclusiones), as can be inferred by the
results presented by the program when writing exclusiones (cf. Figure 9).
A similar problem may be encountered by the translator when translating

CONDICIONES GENERALES (cf. Extract 2) into English. A search in the corpus
for conditions shows that in English the construction General Terms and Condi-
tions (cf. Figure 10) with capital letters is preferred in most cases.
5.2.2 Conceptual information

In English the policies always refer to the insured person (cf. Extract 1), whereas
the Spanish legal system recognises various figures. As a result, it may be benefi-
cial to distinguish between the asegurado (the insured person), the tomador (the
person who takes out the insurance) and the beneficiario (the beneficiary of the
insurance). The asegurado is the person (either physical or legal) who is exposed
to a particular risk, either to his person or his property or assets. In other words,
the asegurado is the subject of the contract whether in his person (in the case of
life insurance or pensions for example) or his property (in the case of house in-
surance or insurance against fire amongst others). The tomador is the person who
takes out the insurance and pays the premiums, but may not necessarily be the
beneficiary. The beneficiario is the person specified in the policy as the recipient
of the assistance or compensation covered by the insurance.
The corpus may therefore also be used to clarify concepts and, as a result,
identify which person is being referred to in Spanish. Hence, a search in the cor-
pus based on the expression insured person (cf. Figure 11) shows definitions such
as “Insured person, you, your – each person who an insurance premium has been
paid for as shown on the policy schedule”.
It may, therefore, be concluded that the English term insured person should
be translated as Asegurado with a capital letter as illustrated by the information
shown from the corpus (cf. Figure 12). The option persona asegurada, with 20 oc-
currences, may be ruled out in favour of Asegurado or Asegurados with 5,692 and
646 occurrences respectively.
In the case of the Spanish fragment (cf. Extract 2), the main problem is rooted
in the difficulties of rendering the legislation in translation: Ley 50/1980, de 8 de
octubre, de Contrato de seguro, en la Ley 30/1995, de 8 de Noviembre, de Orde-
nación y Supervisión de los Seguros Privados. Here it may be helpful to remember
that although there is no substantive communitary legislation on the subject of
travel insurance, the contract may be subject to the national regulations of the
countries that the parties making the agreement come from. If the customer
wants an adaptation of the translation to the British legal system, the translator
can use the corpus to find the information necessary to perform this task. The
results of a search in the English subcorpus (cf. Figure 13) for law (legislation was
also searched, but produced no occurrences) show a substantial difference from
the way that legislation is expressed in Spanish. Whereas in Spanish there is much
Figure 11. Definition of ‘insured person’
Figure 12. Concordances for ‘asegurado’
more precision, in English a more generic means of expression is preferred, with

reference made solely to English Law and no mention of the specific regulations
that apply. In addition, on the subject of legislation, it may be seen that in English
the opening formula, Law applicable, does not coincide with the Spanish Artículo
preliminar. This question will be dealt with in the following section (cf. 5.2.3).
Figure 13. Concordances for ‘law’
5.2.3 Textual conventions

Finally, the preliminary documentation work involves carrying out searches fo-
cusing on the typology of the text to be translated. In this case our intention was
to find typical opening formulas in the travel insurance policies in Spanish equiv-
alent to the English Important (cf. Extract 1). We therefore searched for concor-
dances in Spanish based on Importante. The results show that the typical opening
formula for this section in Spanish is not Importante but MUY IMPORTANTE
with the whole sequence in capital letters (cf. Figure 14).
In the case of the Spanish text (cf. Extract 2), the typical opening formula
consists of a preliminary article (Artículo Preliminar) which contains references
to the relevant legislation. However, the corpus shows that the English conven-
tion has its own opening formula in travel insurance policies, Law applicable,
which, furthermore, generally appears in the last paragraph of the policy and
therefore constitutes a closing formula rather than the opening formula found
in Spanish.
Figure 14. Concordances for ‘importante’
5.3 Target text samples
Once all the necessary information has been gathered from the travel insurance
corpus, the translator is in a position to offer a translation of both extracts. It is es-
sential to take into account all the points that have been outlined so far given their
importance when it comes to segmenting and reorganising the information in the
target text (TT). The following are suggested translations of Extracts 1 and 2.
Extract 1 (TT):
MUY IMPORTANTE
Esta es su póliza de asistencia en viaje. En ella se incluyen las
garantías, límites y exclusiones de los Asegurados y a partir de las cuales
podrá efectuarse cualquier reclamación.
Extract 2 (TT):
General Terms and Conditions

This is your travel insurance contract.
Law applicable: This policy is subject to Spanish law.
6. Conclusion
We would like to begin our concluding remarks by quoting Zanettin

(2002a: NP):
Recent research in translation studies has stressed the contribution which cor-
pora of electronic texts can bring to translators. By using appropriate software
translators can look up words in a matter of seconds, and highlight patterns by
sorting contexts around search words. If a corpus is appropriately designed, it can
provide reliable evidence of authentic linguistic behaviour and text-structuring
conventions by highlighting recurrent patterns. Terminological and collocational
information can be especially useful.
As we have seen, it is possible to meet a large part of the translator’s documenta-

tion needs through the compilation and/or management of comparable virtual
corpora. As a result, translators gain a great deal through becoming both corpus
compilers and users. The heuristic tasks necessary in selecting systems to be used
for mining the information, as well as the parallel task of finding the information
that will be taken from the Internet, are an authentic exercise in applied docu-
mentation. Simultaneously, this leads to the development of documentation com-
petence and, as a result, linguistic-textual competence for the translator.
At the same time, a well planned virtual corpus that complies with appropri-
ate design criteria and which is representative in terms of the type of target text
that is required may contribute to the development of translators’ overall com-
petence. The preparatory tasks involved in selecting and evaluating information
sources lead to obvious savings in terms of time and effort that allow the transla-
tor to focus on other issues that require more attention, such as taking decisions
or evaluating different translation options.
In this article we have focused on the use of virtual corpora as the docu-
mentation resource par excellence in specialist translation training. However, the
methodology behind corpus compilation is not always very clear and all too often
the availability of documents on the Internet is the crucial criterion which deter-
mines the size of the collection of texts. As a result, if the collection of texts is to
qualify as a “corpus” and be considered as representative of a particular field, it is
essential that it conforms to clear design parameters that are set out from the be-
ginning followed by a specific compilation protocol. This protocol is divided into
four distinct phases: (a) locating and accessing resources; (b) downloading data;
(c) text formatting; and (d) data storage.
Corpus representativeness may also be measured a posteriori using ReCor, a
computer programme that calculates the minimum number of documents and
words that should be included in specialised language corpora, in order that they
may be considered representative. It should be pointed out that it is not possible

to establish the minimum number of documents for a given corpus a priori, as the
size will depend on the language and text types involved, as well as on the restric-
tions of a particular specialised field and any other diasystematic limitations.
Virtual comparable corpora, constructed in accordance with the protocol
outlined in this study, are extremely useful for the study of discourse within the
field of specialisation under examination, the way this discourse manifests itself
in the respective documents as well as the forms these texts take in practice. This
utility may be seen from a monolingual and monocultural perspective as well as
from the point of view of translation, comparison and interlinguistic and intercul-
tural mediation. As a result, the virtual corpus may be viewed as a highly effective
tool in specialised translation training since it promotes autonomous processes of
teaching-learning by establishing appropriate mechanisms for specialisation and
diversification for the translator. In addition, it encourages the study of texts that
students have translated with the objective of correcting and validating translation
assignments, as well as many other possible uses that are still to be discovered.
References
ACT. 2005. Primer estudio de mercado de los servicios de traducción profesional en España de la
Asociación de Empresas de Traducción (ACT). Madrid: ACT.
Almahano Güeto, I. 2002. El contrato de viaje combinado en alemán y español: Las condiciones
generales. Un estudio basado en corpus. PhD Thesis. Málaga: Universidad de Málaga.
Aston, G. (ed.). 2001. Learning with Corpora. Bolonia: CLUEB.
Aurioles Martín, A. 2005 [2002]. Introducción al Derecho Turístico (Derecho Privado del Turis-
mo). Madrid: Tecnos.
Aurioles Martín, A., Benavides Velasco, P. G. and González Fernández, M. B. 2004. Contrata-
ción Turística. Technical document BFF2003-04616 MCYT/TI-DT-2004-1. 1–12. <http://
turicor.com/privada/documentos/TI-DT-2004-1.pdf>. [14/03/2007].
Austermühl, F. 2001. Electronic Tools for Translators. Manchester: St. Jerome.
Bernardini, S. and Zanettin, F. (eds). 2000. I corpora nella didattica della traduzione. Corpus Use
and Learning to Translate. Bolonia: CLUEB.
Biber, D., Conrad, S. and Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure
and Use. Cambridge: Cambridge University Press.
Bowker, L. 2002. Computer-Aided Translation Technology: A Practical Introduction. Ottawa:
University of Ottawa Press.
Bowker, L. and Pearson, J. 2002. Working with Specialized Language: A practical guide to using
corpora. London: Routledge.
Braun, E. 2005 [1996]. “El caos ordena la lingüística. La ley de Zipf.” In Caos fractales y cosas
raras, E. Braun (ed.). Mexico D.F.: Fondo de Cultura Económica. <http://omega.ilce.edu.
mx:3000/sites/ciencia/volumen3/ciencia3/150/htm/caos.htm> [14/03/2007].
Carrasco Jiménez, R. C. 2003. La ley de Zipf en la Biblioteca Miguel de Cervantes. Alicante: Uni-
versidad de Alicante. <http://www.dlsi.ua.es/asignaturas/aa/Zipf.pdf> [14/03/2007].
CORIS/CODIS. 2006. “Progettazione e costruzione di un Corpus di Italiano Scritto.” CO-
RIS/CODIS. Bologna: CILTA. <http://corpus.cilta.unibo.it:8080/coris_itaProgett.html>
[14/03/2007].
Corpas Pastor, G. 2001. “Compilación de un corpus ad hoc para la enseñanza de la traducción
inversa especializada.” Trans: Revista de Traductología 5: 155–184.
Corpas Pastor, G. (ed.) 2003a. Recursos documentales y técnicos para la traducción del discurso
jurídico (español, alemán, inglés, italiano, árabe). Granada: Comares.
Corpas Pastor, G. 2003b. “Diseño de un tipologizador para la traducción jurídica: Del corpus
al prototipo textual.” In Recursos documentales y técnicos para la traducción del discurso
jurídico (español, alemán, inglés, italiano, árabe), G. Corpas Pastor (ed.), 33–58. Granada:
Comares.
Corpas Pastor, G. 2004a. “Localización de recursos y compilación de corpus vía Internet: Apli-
caciones para la didáctica de la traducción médica especializada.” In Manual de documen-
tación y terminología para la traducción especializada, C. Gonzalo García and V. García
Yebra (eds), 223–257. Madrid: Arco/Libros.
Corpas Pastor, G. 2004b. “The Turicor Project: Work in Progress.” Revista Europea de Derecho
de la Navegación Marítima y Arenonáutica xx: 1–14. <http://turicor.com/pdf/corpas2004b.
pdf> [14/03/2007].
Corpas Pastor, G. 2004c. “La traducción de textos médicos especializados a través de recursos
electrónicos y corpus virtuales.” In Las palabras del traductor. Actas del II Congreso Inter-
nacional «El español, lengua de traducción», 20 y 21 de mayo, Toledo 2004, L. González and
P. Hernúñez (eds), 137–164. Brussels: Comisión Europea/ESLETRA. <http://www.turicor.
com/pdf/corpas2004c.pdf> [14/03/2007].
Corpas Pastor, G. and Seghiri, M. 2006a. El concepto de representatividad en la Lingüística del
Corpus: Aproximaciones teóricas y metodológicas. Technical document BFF2003-04616
MCYT/TI-DT-2006-1.
Corpas Pastor, G. and Seghiri, M. 2006b. “Recursos documentales para la traducción de se-
guros turísticos en el par de lenguas inglés-español.” In Investigación y traducción: Una
mirada al presente en la labor investigadora y en el ejercicio de la profesión de la licenciatura
Traducción e Interpretación, E. Postigo Pinazo (ed.). Málaga: Universidad de Málaga.
Corpas Pastor, G. and Seghiri, M. 2007a. “Specialized Corpora for Translators: A Quantitative
Method to Determine Representativeness.” Translation Journal 11 (3). < http://translation-
journal.net/journal/41corpus.htm> [14/03/2007].
Corpas Pastor, G. and Seghiri, M. 2007b. “Determinación del umbral de representatividad de
un corpus mediante el algoritmo N-Cor.” Procesamiento del Lenguaje Natural 39: 165–172.
<http://www.sepln.org/revistaSEPLN/revista/39/20.pdf> [14/03/2007].
Corpas Pastor, G. and Seghiri, M. Forthcoming. El concepto de representatividad en lingüística
de corpus: Aproximaciones teóricas y consecuencias para la traducción. Málaga: Servicio de
Publicaciones de la Universidad.
Council Directive 73/240/EEC of 24 July 1973 abolishing restrictions on freedom of establish-
ment in the business of direct insurance other than life assurance.
Council Directive 76/580/EEC of 29 June 1976 amending Directive 73/239/EEC on the coor-
dination of laws, regulations and administrative provisions relating to the taking up and
pursuit of the business of direct insurance other than life assurance.
Council Directive 78/473/EEC of 30 May 1978 on the coordination of laws, regulations and
administrative provisions relating to Community co-insurance.
Council Directive 84/641/EEC of 10 December 1984 amending, particularly as regards tourist
assistance, the First Directive (73/239/EEC) on the coordination of laws, regulations and
administrative provisions relating to the taking-up and pursuit of the business of direct
insurance other than life assurance.
Council Directive 87/343/EEC of 22 June 1987 amending, as regards credit insurance and sure-
tyship insurance, First Directive 73/239/EEC on the coordination of laws, regulations and
administrative provisions relating to the taking-up and pursuit of the business of direct
insurance other than life assurance.
Council Directive 87/344/EEC of 22 June 1987 on the coordination of laws, regulations and
administrative provisions relating to legal expenses insurance.
Council Directive 90/618/EEC of 8 November 1990, amending, particularly as regards motor
vehicle liability insurance, first Council Directive 73/239/EEC and second Council Direc-
tive 88/357/EEC on the coordination of laws, regulations and administrative provisions
relating to direct insurance other than life assurance.
Council Directive 92/49/EEC of 18 June 1992 on the coordination of laws, regulations and ad-
ministrative provisions relating to direct insurance other than life assurance and amending
Directives 73/239/EEC and 88/357/EEC (third non-life insurance Directive).
Council Directive 92/96/EEC of 10 November 1992 on the coordination of laws, regulations
and administrative provisions relating to direct life assurance and amending Directives
79/267/EEC and 90/619/EEC (third life assurance Directive).
Directive 2000/26/EC of the European Parliament and of the Council of 16 May 2000 on the
approximation of the laws of the Member States relating to insurance against civil liability
in respect of the use of motor vehicles and amending Council Directives 73/239/EEC and
88/357/EEC.
Directive 2000/64/EC of the European Parliament and of the Council of 7 November 2000
amending Council Directives 85/611/EEC, 92/49/EEC, 92/96/EEC and 93/22/EEC as re-
gards exchange of information with third countries.
Directive 2002/13/EC of the European Parliament and of the Council of 5 March 2002 amend-
ing Council Directive 73/239/EEC as regards the solvency margin requirements for non-
life insurance undertakings.
Directive 2002/92/EC of the European Parliament and of the Council of 9 December 2002 on
insurance mediation.
European Parliament and Council Directive 95/26/EC of 29 June 1995 amending Directives
77/780/EEC and 89/646/EEC in the field of credit institutions, Directives 73/239/EEC and
92/49/EEC in the field of non- life insurance, Directives 79/267/EEC and 92/96/EEC in the
field of life assurance, Directive 93/22/EEC in the field of investment firms and Directive
85/611/EEC in the field of undertakings for collective investment in transferable securities
(Ucits), with a view to reinforcing prudential supervision.
First Council Directive 73/239/EEC of 24 July 1973 on the coordination of laws, regulations
and administrative provisions relating to the taking-up and pursuit of the business of di-
rect insurance other than life assurance.
Fletcher, W. H. 2004. “Facilitating the Compilation and Dissemination of Ad-Hoc Web Cor-
pora.” In The Fith International Conference on Teaching and Language Corpora, G. Aston,
S. Bernardini and D. Stewart (eds), 1–18. Amsterdam: Benjamins. <http://www.kwicfind-
er.com/Facilitating_Compilation_and_Dissemination_of_Ad-Hoc_Web_Corpora.pdf>
[14/03/2007].
Giouli, V. and Piperidis, S. 2002. Corpora and HLT. Current trends in corpus processing and an-
notation. Bulgaria: Insitute for Language and Speech Processing. <http://www.larflast.bas.
bg/balric/eng_files/corpora1.php> [14/03/2007].
Granger, S. and Petch-Tyson, S. (ed.). 2003. Extending the Scope of Corpus-Based Research: New
Applications, New Challenges. Amsterdam and Atlanta: Rodopi.
Heaps, H. S. 1978. Information Retrieval: Computational and Theoretical Aspects. New York:
Academic Press.
Insurance Act 2000.
Kenny, D. 2001. Lexis and Creativity in Translation. A Corpus-based Study. Manchester: St.
Jerome.
Lavid López, J. 2005. Lenguaje y nuevas tecnologías: nuevas perspectivas, métodos y herramientas
para el lingüista del siglo XXI. Madrid: Cátedra.
Laviosa, S. (ed.). 1998. L’approche basée sur le corpus / The Corpus-based Approach, Meta 43 (4).
Ley 18/1997, de 13 de mayo, de modificaciones del artículo 8 de la Ley de Contrato de Seguro,
para garantizar la plena utilización de todas las lenguas oficiales en la redacción de los
contratos. BOE. 0115 de 14 de mayo de 1997.
Ley 30/1995, de 8 de noviembre, de ordenación y supervisión de los Seguros Privados.
Ley 50/1980, de 8 de octubre, del Contrato de Seguro.
Ley 50/1980, de 8 de octubre, del Contrato de Seguro.
Moreiro González, J. A. 2002. “Aplicaciones al análisis automático del contenido provenientes
de la teoría matemática de la información.” Anales de documentación 5: 273–286. <http://
www.um.es/fccd/anales/ad05/ad0515.pdf> [14/03/2007].
Orden Ministerial de 27 de enero de 1988 por la que se califica la cobertura de las prestaciones
de asistencia en viaje como operación de seguro privado.
Pearson, J. 1998. Terms in Context, Studies in Corpus Linguistics. Amsterdam/Philadelphia:
John Benjamins.
Radev, D., Fan, W., Qi, H., Wu, H. and Grewal, A. 2005. “Probabilistic question answering on
the web.” Journal of the American Society for Information Science and Technology (JASIST)
56 (6): 571–583. <http://filebox.vt.edu/users/wfan/paper/www/www.pdf> [14/03/2007].
Sanahuja, S. and Silva, A. 2001. “Muestreo teórico y estudios del discurso. Una propuesta teóri-
co-metodológica para la generación de categorías significativas en el campo del Análisis
del Discurso.” El Estudio del Discurso: Metodología Multidisciplinaria. II Coloquio Nacional
de Investigadores en Estudios del Discurso. La Plata, 6 al 8 de septiembre de 2001. Buenos
Aires: Asociación Latinoamericana de Estudios del Discurso and Universidad Nacional
del Centro de la Provincia de Buenos Aires. <http://www.sai.com.ar/KUCORIA/discurso.
html> [14/03/2007].
Sánchez-Gijón, P. 2003a. “És la web pública la nova biblioteca del traductor?” Tradumàtica:
Traducció i tecnologies de la informació i la comunicació 2: 1–7. <http://www.bib.uab.es/
pub/tradumatica/15787559n2a7.pdf> [14/03/2007].
Sánchez-Gijón, P. 2003b. Els documents digitals especialitzats: utilització de la lingüística de cor-
pus com a font de recursos per a la traducció. PhD Thesis. Barcelona: Universidad Autóno-
ma de Barcelona.
Sánchez Pérez, A. and Cantos Gómez, P. 1997. “Predictability of Word Forms (Types) and Lem-
mas in Linguistic Corpora. A Case Study Based on the Analysis of the CUMBRE Corpus:
An 8-Million-Word Corpus of Contemporary Spanish.” International Journal of Corpus

Linguistics 2 (2): 259–280.
Second Council Directive 88/357/EEC of 22 June 1988 on the coordination of laws, regulations
and administrative provisions relating to direct insurance other than life assurance and
laying down provisions to facilitate the effective exercise of freedom to provide services
and amending Directive 73/239/EEC.
Seghiri, M. 2006. Compilación de un corpus trilingüe de seguros turísticos (español-inglés-ital-
iano): aspectos de evaluación, catalogación, diseño y representatividad [Compilation of a
trilingual corpus of travel insurance contracts (English-Italian-Spanish): evaluation, classifi-
cation, design and representativeness]. PhD Thesis. Málaga: Universidad de Málaga.
Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
The Financial Services and Markets Act 2000 (Regulated Activities).
The Insurers (Reorganisation and Winding Up) Regulations 2004.
Varantola, K. 1997. “Translators, dictionaries and text corpora.” In I corpora nella didattica della
traduzione, S. Bernardini and F. Zanettin (eds), 117–133. Bologna: CLUEB.
WTTC. 2006a. World Travel and Tourism climbing to new heights. The 2006 Travel & Tour-
ism Economic Research. London: World Travel & Tourism Council. <http://www.wttc.
org/2006TSA/pdf/World.pdf> [14/03/2007].
WTTC. 2006b. United Kingdom Travel and Tourism climbing to new heights. The 2006 Travel &
Tourism Economic Research. London: World Travel & Tourism Council. <http://www.wttc.
org/2006TSA/pdf/United%20Kingdom.pdf> [14/03/2007].
WTTC. 2006c. Ireland Travel and Tourism climbing to new heights. The 2006 Travel & Tour-
org/2006TSA/pdf/Ireland.pdf> [14/03/2007].
WTTC. 2006d. Italy Travel and Tourism climbing to new heights. The 2006 Travel & Tourism
Economic Research. London: World Travel & Tourism Council. <http://www.wttc.org/
2006TSA/pdf/Italy.pdf> [14/03/2007].
WTTC. 2006e. Spain Travel and Tourism climbing to new heights. The 2006 Travel & Tour-
org/2006TSA/pdf/Spain.pdf> [14/03/2007].
Yang, D., Cantos Gómez, P. and Song, M. 2000. “An Algorithm for Predicting the Relationship
between Lemmas and Corpus Size.” ETRI Journal 22 (2): 20–31. <http://etrij.etri.re.kr/
Cyber/servlet/GetFile?fileid=SPF-1042453354988> [14/03/2007].
Young-Mi, Jeong. 1995. “Statistical Characteristics of Korean Vocabulary and Its Application”.
Lexicographic Study 5 (6): 134–163.
Zanettin, F. 2002a. “DIY Corpora: The WWW and the Translator.” In Training the Language
Services Provider for the New Millennium, B. Maia; J. Haller and M. Urlrych (eds). Porto:
Facultade de Letras, Universidade do Porto. <http://www.federicozanettin.net/DIYcor-
pora.htm> [14/03/2007].
Zanettin, F. 2002b. “CEXI. Designing an English Italian Translational Corpus.” In Teaching and
Learning by Doing Corpus Analysis, B. Ketteman and G. Marko (eds), 329–343. Amster-
dam: Rodopi.
Zanettin, F., Bernardini S. and Stewart, D. (eds). 2003. Corpora in translator education. Man-
chester: St. Jerome.
Developing documentation skills to build
do-it-yourself corpora in the specialised
translation course
This chapter presents the case for systematic use of do-it-yourself corpora in
specialised translation courses, focusing in particular on the use of corpora as a
documentation resource. An overview is given of the importance of documen-
tation in professional translation, its place in different translation competence
models and the advantages and disadvantages of how it is taught in different
translator training centres. Having reached the conclusion that documentation
skills for translation are best acquired in a translation course as a tool to solve
specific translation problems, the author suggests a protocol to help students
create their own DIY corpus in specialised translation courses. The proposal is
illustrated by examples of problems related to the translation of an instruction
manual for an air conditioning system.
Key words: Do-it-yourself corpora, documentation skills, specialised transla-

tion, specialised translation training, translation training
1. Specialised translation and the translator’s needs
Documentation skills are of vital importance to translators of texts in any disci-

pline. Developing this ability is therefore particularly relevant to translator train-
ing. Within the context of specialised translation in particular the use of corpora
as a documentation resource is becoming increasingly popular (Lüdeling, Evert
and Baroni 2007; Maia 2003; Varantola 2003; Zanettin 2002; amongst many oth-
ers). Translation competence within the field of specialised translation requires
skills and knowledge that Nord (1991: 46) identifies as relevant translation com-
petences, and which Askehave (2000) summarises as follows:
110 Pilar Sánchez-Gijón
– Linguistic competence in source and target languages. This competence com-

prises formal and semantic aspects of vocabulary and grammar, as well as
pragmatic aspects of text type, genre, register and style.
– Cultural competence, which provides the necessary knowledge of source and
target language cultures.
– Domain-specific competence within the context of specialist knowledge.
– Instrumental competence for the purposes of documentation and informa-
tion retrieval.
Gamero (1998) goes still further to describe specialised translation competence

(specifically in the translation of technical texts) in terms of the characteristics
of the texts to be translated. Documentation skills allow translators to acquire
domain-specific factual and linguistic knowledge.
There is no doubt that one of the most important aspects of specialised trans-
lation, and one that may pose a problem for the translator, is the subject matter
involved. A specialised text is both the medium and the product of specialised
communication, usually between experts. The content of the text is specialist
knowledge which, by definition, lies within the realm of the expert. Translators
must know how to compensate for any shortcomings in their knowledge of the
domain within which the text to be translated is located.
Hurtado Albir (2001: 61) refers to domain knowledge as above all a question
of understanding concepts. Unlike the expert, translators do not have to create
specialist texts on their own. Lack of domain knowledge, such as terminology, can
be compensated by documentation skills. Terms are units of knowledge specific
to specialised language, so specialised texts are the medium in which they natu-
rally occur. They may not form part of a translator’s general linguistic competence
(in either the source or target text languages) or, of his/her cultural or domain
knowledge. (In the PACTE model (2000) cultural and domain knowledge are a
part of the extra-linguistic sub-competence.)
In contrast, Faber (2002) gives priority to the concept of terminological com-
petence in translation and categorises domain-specific knowledge as a part of this
competence. She links the acquisition of terminology in one or more languages
with knowledge of a specific domain.
Despite the differences amongst the translation competence models used by
the different authors, all agree that documentation skills are essential for transla-
tors. They can compensate for shortcomings in knowledge of the subject domain
and the textual characteristics of the text to be translated.
DIY corpora in the specialised translation course 111
2. Documentation in translator training
Given the importance of documentation in the translation process, it should not

be neglected in translator training programmes. Even in professional practice,
the documentation phase of the translation process is very important, and pro-
fessional translators may spend as much as 30% of their time retrieving docu-
mentary information. They do so mainly to obtain domain-specific and linguistic
information (Mayoral 1997/1998: 140).
Therefore, we believe that documentation in specialised translator training
should focus on domain-specific and linguistic information. By domain-specific
information we mean the information pertaining to the notions or units of knowl-
edge of a field or domain that allows translators to understand a text, and that may
be obtained in either the source or the target text language. By linguistic informa-
tion we mean information in either the source or the target text pertaining to a
text’s form as opposed to its content. It includes lexical and grammatical items
related to the microstructure of the text as well as items of style, genre and for-
mat related to the macrostructure of the text. The acquisition of factual informa-
tion implies the acquisition of linguistic information and vice versa, since content
(i.e. specialised knowledge) and form (specialised expressions or terminological
units) go hand-in-hand (Cabré Castellví 2001: 24).
There may be agreement about the importance of documentation skills as a
part of translation competence, however there are different opinions as to how
these skills should be acquired. In translation faculties in Spain, subjects that
contribute to the acquisition of documentation skills may be grouped into three
categories:
1. Subjects specific to documentation and subjects providing an introduction to

other disciplines.
2. Instrumental or applied subjects.
3. Subjects involving translation.
2.1 Subjects specific to documentation and subjects providing

an introduction to other disciplines
The subject of documentation applied to translation may include contents that go

beyond what may be strictly considered to be the context of translation. Moreover,
. Contents that are sometimes closely linked to librarianship, like creating a book register or
summarising a text.
the fact that it is taught as an independent subject with no immediate application

to the practice of translation makes it difficult for students to acquire the pro-
cesses, techniques and strategies involved in translation-specific situations.
Translator training programmes also often include subjects that provide an
introduction to specific domains such as law, economics and engineering as a first
step to training translators specialising in the field. The aim of this type of subject
is to provide trainee translators with sufficient knowledge of the domain to be able
to understand its basic principles as well as the characteristics of its discourse and
the possible text types that may have to be translated. To obtain this knowledge,
the students must become familiar with the different sources of information and
documentation pertaining to the specific field of knowledge in question, whether
it is applied to translation or as an activity in itself.
2.2 Instrumental or applied subjects
Subjects included in this category are present in practically all translation curri-
cula in European universities. They are subjects that have to do with the treatment
and solution of specific problems related to translation. They may include subjects
dealing with terminology and language for specific purposes, and subjects dealing
with translation software.
Subjects dedicated to terminology and language for specific purposes usually
evidence a marked orientation towards terminology-oriented methodologies, of
which documentation forms an essential part (Maia 2003), although it may not
necessarily be taught from a translation perspective. Decisions in relation to termi
nology are usually based on original texts in each language without taking into
account the function of the target text or the translation context. Nevertheless,
the processes, techniques and strategies used to acquire documents prior to deci-
sion-making may be considered valid for the documentation process within the
context of translation.
Subjects dealing with translation tools tend to stress the importance of docu-
mentation. With few exceptions, translation tools with documentation functions
usually use the translators’ own output as a documentation resource. Two tools
that are often used by translators and taught to trainees are terminology manage-
ment tools and memory-based computer-assisted translation systems.
. Translation tools is a term that includes both specific tools for translation – in this context
mainly computer-assisted translation (CAT) tools – and tools for resource consultation or de-
velopment (such as corpus tools and terminology management tools).
. The main exponent may be CAT tools based on translation memory.
Terminology management tools help translators to order the terminology be-

ing compiled and to make the best use of it. Some programs have functions for ex-
tracting terminology (largely semi-automatic), though only from texts provided
by the translators themselves. In computer-assisted translation systems the func-
tions designed for information retrieval are again limited to texts produced or
provided by the translators. Within this context, solving translation problems, in
particular linguistic ones, is based on the search for examples of usage that appear
in translated texts. Today, the options available are no longer limited to searches
for chains of characters but have progressed to the level of virtually generating
concordances or key words in context, just like a corpus manager.
2.3 Subjects involving translation
Certainly, the process of documentation in translation is best illustrated and con-

textualised in these subjects. In a translation situation, faced with the source text
and a translation assignment, students begin to make decisions based on their
own knowledge or on information obtained from sources consulted for the pur-
pose. In translation classes, in particular specialised translation classes, purely
linguistic documentation resources (dictionaries, glossaries, databases, etc.), as
well as reference books and parallel texts are traditionally presented as sources of
specialist information for translation purposes. Reference books – generally field-
specific informative texts that are either introductory or general in scope – are
usually consulted in search of information that will help students understand the
source text. Parallel texts, whether in the source or target text languages, usually
provide information on text-type conventions or particularities of field-specific
language use (terminological, collocational, phraseological, syntactical, etc.). In
either case, students must read the texts they consult (usually to get the gist) and
hope that the small selection of texts they have used will provide them with the
solution to the translation problems they have found in the original text.
However, Internet has now become the main source of documentation and
can provide an enormous variety of texts (reference books, and parallel texts), but
at the same time it generates problems which may interfere with the translation
training process:
1. The large amount of information that may be accessed just by clicking on a

mouse, and the hypertextual structure of the data available causes what some
. For instance, Transit Concordance searches, which allow users to search for segments in-
cluding different words that need not be together.
authors refer to as infoxication and cognitive overload. Students, in particu-

lar those who are less confident about their knowledge on the topic and/or in
their documentation skills, are unable to set limits on their process of docu-
mentation since in each new parallel text or in each new reference work they
may find the perfect piece of information which will help solve their transla-
tion problems. Thus, they fail to trust their own competence and criteria as
trainee translators and base their translation decisions on any text obtained as
a result of a documentation process that is often far too long.
2. Furthermore, information retrieved using the Internet does not necessarily
conform to minimum quality requirements since it has not passed a process
of peer review comparable to the one applied to printed publications. There-
fore, students who rely on the Internet as their main source of documentation
are in a very weak position, since the texts obtained are often of doubtful
quality as regards authority, subject matter and language use. Any glossary or
website may be used as a reference source on which students may base their
translation decisions without contrasting them with the minimum quality
indicators of a Web resource (Sanchez-Gijón 2004: 34). This lack of critical
spirit towards Internet resources is very common among students who are
beginning their specialised translation training.
3. This kind of indiscriminate consultation of texts does not permit information
to be retrieved in a systematic way. Despite the advantages of the digital form
of the texts consulted, students continue to use reading for gist as the main
means of obtaining information.
Therefore, it would seem that maintaining traditional documentation methods in

translation training while using on-line material only insures the disadvantages of
consulting unreliable printed sources, and fails to take advantage of the benefits
of access to texts in digital format. In order to make full use of the documentary
resources available to translators today, trainers should be considering the meth-
odologies offered by corpus linguistics.
. Infovis defines infoxication as “Intellectual intoxication produced by an excess of informa-

tion.” (http://www.infovis.net/index.php?lang=1).
. According to Codina (2002), disorientation of the reader caused by the network established
within the hypertext of a document or a number of documents appearing on the Net.
. Very often students look for each terminological problem directly in a search engine, even
though they have already retrieved texts to solve previous problems that could help them with
this new problem. This never-ending documentation strategy usually leads to inconsistencies,
since every new solution found may come from a context that has very little in common with
the context from which earlier solutions come.
3. The use of a DIY corpus as a documentation resource
Having compared the different teaching approaches to documentation, it seems

clear that documentation skills for translation are best acquired in a translation
course, as a tool to solve translation problems. In order to overcome the disadvan-
tages of traditional documentation methodologies and the vastness of the Inter-
net, we propose a do-it-yourself corpus as a documentation resource.
For the purposes of specialised translation, we consider a do-it-yourself (DIY)
corpus to mean a corpus of texts that is put together for the sole purpose of pro-
viding information – either factual, linguistic or field-specific – for the purposes
of completing a translation task (Sánchez-Gijón 2004). Corpora constructed for
the specific purpose of being used as a translation resource for a specific transla-
tion task have also been called ad hoc corpora or disposable corpora (Varantola
2003). The usefulness of this kind of corpus not just as a training resource but as
a resource for professional translators has also been pointed out (Zanettin 2002;
Sánchez-Gijón 2005).
The following section deals with building DIY corpora and developing skills
for using them in specialised translation training. A DIY corpus should be built
bearing in mind the translation problems posed in the source or target texts. The
examples given here are from the field of air conditioning. The introduction of
this resource into the translation classroom requires computers, Internet access to
search for texts to form part of the corpus and a corpus manager.
3.1 Retrieving texts for the DIY corpus
Faced with a translation task, translation students learn that professional transla-
tors may look for texts and create their own DIY corpus from the following three
sources:
1. The client.
2. Specialist centres.
3. The Internet, as a means of obtaining information in multiple digital
formats.
In the translation classroom these conditions may also be given. In the first in-
stance, the lecturer may act as the client and provide documentation that students
. These examples were worked on with English to Spanish postgraduate translation and lo-
calisation students of the Autonomous University of Barcelona and the Jaume I University dur-
ing the 2006–2007 academic year.
may incorporate into their DIY corpus. By doing this, however, the lecturer is not
helping students acquire or improve their documentation skills. Lecturers may
also provide students with access to specialist sources of documentation, such
as databases, academic articles and institutional centres of documentation, or
encourage them to consult these sources. By going to one of these information
centres, the lecturer can be sure that the information obtained by students is qual-
ity information and homogeneous from the point of view of subject matter, text
format and/or text genre. Unfortunately, resources of this kind are not always
available to our students for every subject domain.
Resorting to the Internet as a means of accessing texts for use in a corpus is
the most viable alternative for documenting any translation task that is being un-
dertaken. In the previous section we mentioned the absence of quality controls
for many of the texts retrieved from the Internet. It may seem that this process of
documentation could also be affected since each of the texts included in the corpus
has as much or as little quality as those used in the process of documentation us-
ing parallel texts. However, by collecting a large number of texts in a DIY corpus,
analysing them together, and observing the different phenomena quantitatively,
we can be sure that any translation decision will be based on a number of different
texts, that is to say, a number of different authors. Thus, even if each of the indi-
vidual texts may not be particularly reliable, the analysis of the sum total of all the
texts will be. In other words, getting, for instance, the same terminological solution
from texts written by different authors ensures a consensus in the use of this solu-
tion. Thanks to the corpus linguistics methodology, every text may be validated by
another text in the DIY corpus, thus ensuring quality in the analysis.
Using the Internet as a means to access texts to build a corpus does, however,
have its problems and these must be taken into consideration. If students are to
build their own corpus, a systematic approach should be taken and three different
stages should be clearly differentiated.
3.1.1 Determining the characteristics of the resource that will provide the texts
A search engine or directory is usually used to carry out searches on the Internet
and to obtain a list of texts that may be included in the DIY corpus constructed by
students. As well as searchers such as Google or Yahoo!, it is worth while consult-
ing specific search engines that only index specific resources. Two types of resource
may be particularly useful for preparing a DIY corpus on a specific subject:
. Varantola (2003, 64) pointed out the unreliability of many Internet texts as a drawback that
may affect the construction of this kind of corpora. It was also pointed out as a shortcoming by
Zanettin (2002, 12): “The relevance and reliability of documents to be included in the corpus
needs to be carefully assessed.”
1. Search engines specialising in a specific field or discipline (known as verti-

cal search engines), or specialising in a specific type of text or text genre (for
example, search engines for academic texts). These databases comprise only
field-specific texts or resources, or resources produced in highly controlled
contexts.
2. Regional or subject portals. These do not usually have a database but provide
direct links to documents within the portal itself or to related external docu-
ments.
3.1.2 Using queries to carry out more precise searches

Internet search engines currently provide lists of hundreds of thousands of results
for each search. However, they can include resources of many different kinds from
the point of view of subject matter, text genre and content. As Zanettin stated:
“Building a corpus of web pages basically involves an information retrieval opera-
tion, conducted by browsing the Internet to locate relevant and reliable documents
which can then be saved locally and made into a corpus to then be analysed with
the help of concordancing software” (2002, 12). As we wish to improve students’
documentation skills for constructing a DIY corpus using texts from the Internet,
it is worth taking advantage of all the possibilities offered by search engines and
carrying out more specific searches. The texts retrieved will thus share not just the
topic but also the language variant and even the text genre. To build a corpus in
the field of air conditioning, students carried out the following searches:10
– Using key words or expressions that identify the term within the context of
subject matter (e.g. “aire acondicionado” + instalación).
– Eliminating keywords resulting in heterogeneous subject matter (e.g. “aire
acondicionado” –“bomba de calor”).
– Including words related to the genre or text type that they were most inter-
ested in (e.g. “aire acondicionado” + “manual de instrucciones”11 + véase12).
10. All the following specific searches are expressed according to Google search rules and or-
ders.
11. “Manual de instrucciones” (“User guide” in English): using the text genre as a search key
word. In this case, most of these texts include the genre name as part of their title.
12. A typical expression used in this genre. Expressions like this, which are linked to a genre,
are very useful for delimiting searches and obtaining homogeneous results from the genre point
of view.
– Taking advantage of the specific functions of some search engines to obtain

much more specific, homogeneous results, such as searching for synonyms,13
searching for words close to a keyword (similar to the search for documents
with cooccurrences)14 and using file extensions.15
Downloading texts correctly. According to Samson (2005, 102), one of the four
cross-curricular skills related to computer use (or as he calls it, ‘computer litera-
cy’) in translation is file management.16 In order to use the texts accessed using the
search in a DIY corpus, they must be downloaded so that they can be used locally.
When downloading texts, students may save time, and avoid possible technical
problems, if they are aware of what they want to download and what type of docu-
ments they are dealing with. Given that only textual elements of the retrieved
documents may be included in the DIY corpus, all non-textual elements such as
pictures, sound, etc. need not be downloaded. The time taken to download docu-
ments is therefore minimised, as well as the space they occupy on computers. In
order to help develop students’ skills in computer use, it should be pointed out
that document download may be carried out in two different ways:
– If a complete website or directory from a website is to be downloaded, a web-

site copier may be used.
– If documents found as a result of a specific search are to be downloaded, a
client-based metasearch system17 may be used to search for, and download,
documents on the Internet.
13. The search for synonyms in Google may be carried out using the sign ~. For example, a
search for ~physician will lead to results that include this word or words from the same seman-
tic field, such as doctor, medical and hospital.
14. The search for similar words may be carried out using Exlead and the word NEAR, and
may be used to link key words (e.g. “aire acondicionado” NEAR instalación) or key words that
are specific to the style of a genre or text type (e.g. “aire acondicionado” NEAR “para más infor-
mación”).
15. An example of a search by file types: “aire acondicionado” + “manual de instrucciones” ext:
pdf. Only files with these key words will be retrieved, and all of them will be in pdf – the most
usual format for users’ handbooks on the Internet.
16. These skills include: configuration of the user’s workstation, file management, digital text
production (word processing) and basic Internet use. Two of these four skills, file management
and basic Internet use, are worked on and improved through the kind of translation activities
or tasks that are proposed in this paper.
17. Client-based meta-search applications can carry out searches on different search engines
(hence their name). Their output is a single list of results obtained from the different search
engines that can be downloaded.
Pretreatment of texts. Once texts have been downloaded, it is sometimes neces-

sary to make certain changes to files in order to avoid problems in accessing texts
with corpus tools. Two types of operation may be carried out:
– If the files are in formats other than html or derivatives, they must be convert-
ed to plain text (txt) using a suitable conversion program.18 Such programs
generally convert all files in a directory simultaneously, so this operation is
carried out virtually automatically.
– If the files are in html format or derivatives, most corpus tools can process
them directly. However, given the particularities of the original editing of
these files, it may be necessary to convert some codes into special charac-
ters.19 This may be done using a format converter or even using some corpus
tools (e.g. WordSmith Tools20).
All these operations require computer skills that students should have acquired
during their training, since they are not specific to translator training. However, it
is often necessary to ensure that all the students in the translation class have suffi-
cient instrumental competence to allow them to carry out these tasks successfully.
It is in the translation classroom that students’ computer skills are contextualised
and become part of their translation competence.
3.2 Analysis of the DIY corpus
Once the operations described in Section 3.1 have been carried out, a DIY corpus
of possibly several tens of thousands of words may be constructed. The source
text used to illustrate the process of documentation described in this article is
an English instruction manual for an air conditioning system manufactured by
Carrier Heating & Cooling (Model OM38–45), published in Indianapolis in 1998.
The manual was to be translated into Spanish and a corpus was built of some
30,000 words in Spanish on the subject of air conditioning. The use of the DIY
corpus in specialised translation will provide us with the factual and/or linguistic
18. The most common conversion applications used in this kind of activity are those that con-
vert pdf files into txt files. There are many different applications, both freeware and shareware,
that may be located through any search engine.
19. This is usually the case when accents are used or special characters that occur in some lan-
guages (e.g. Ñ o Ç). In HTML documents these may appear as a code which is then interpreted
by the browser and reproduced as the appropriate character. For example, an Á may be repre-
sented in the code as AACUTE&.
20. Scott 1996.
information necessary for the translator to complete the translation task. It is usu-
ally the corpus constructed in the source text language that provides us with most
factual information since that is the language in which cognitive problems will
occur (Sanchez-Gijón 2005). However, it is the corpus in the target language that
most helps students to identify and solve many of the linguistic problems related
to knowledge shortcomings – they understand the underlying concept of a spe-
cific source language term, but they do not know how to express it in the target
language, or even do not realise that it is indeed a fixed term and not just a casual
expression.21
In cases in which translators are non-native speakers of the source text lan-
guage, specific linguistic problems that occur in the source text will be used as
examples of problems that may be solved using students’ linguistic and transla-
tion intuition with a DIY corpus as the only resource available for validating that
intuition.
3.2.1 Room temperature

In the source text the term room temperature is often found. It is an expression
the translator could translate in several different ways. However, on one occasion
this expression appears in the diagram of a control unit of the air conditioning
system, designating a temperature indicator, together with two hyponyms: Cur-
rent room temperature and desired room temperature. The translation of all three
expressions must include the precise term so that the target user of the translation
can recognise it. If the translation task is performed using a translation memory,
it may offer segments that contain the precise term equivalent. On the other hand,
if it is performed using a DIY corpus, contexts for temperatura are searched for in
the Spanish corpus and the most common expression with the same meaning as
room temperature is selected. The results obtained are shown in Table 1.
As can be seen, temperatura ambiente is the most commonly used expression
and may itself be used as an equivalent for room temperature. However, it may
be observed that a distinction is made between temperatura ambiente actual and
temperatura ambiente deseada, thus corroborating the use of ambiente in this con-
text and providing students with terminological solutions for other translation
problems related to the problem they have attempted to solve.
3.2.2 Personal injury

In the original text, the term personal injury also occurs on a number of occa-
sions. Students may well have no problem understanding this term, and may be
able to suggest possible translations. However, this term is very closely linked to
21. This is the case of current room temperature, which is reported next.
Table 1. Sample of the concordances obtained for temperatura

mínimo cualquier diferencia en la temperatura.
nfriamiento y uno para ajustar la temperatura
INICIALIZAR FILTRO LA TEMPERATURA ACTUAL
ido cualquier diferencia en la temperatura. Además, los sistemas
de su hogar funciona hasta que la temperatura ambi- ua independientemen
TEMPERATURA AMBIENTE ACTUAL,
TEMPERATURA AMBIENTE ACTUAL,
MUESTRA LA TEMPERATURA AMBIENTE
MUESTRA LA TEMPERATURA AMBIENTE
hogar funcionará hasta que la temperatura ambiente aumente MODO(MODE
ACTUAL, LA TEMPERATURA AMBIENTE BOTONES P
ACTUAL, LA TEMPERATURA AMBIENTE
TEMPERATURA AMBIENTE DESEADA,
TEMPERATURA AMBIENTE DESEADA,
ra esté ajustado por debajo de la temperatura ambiente interior y e
junto ajustado por encima de la temperatura ambiente y que el control
ngŸeta 1/83 Tegular estén a temperatura ambiente y tengan el conteni
ura esté ajustado por debajo de la temperatura ambiente y INFORMACIÓN
ducto ajustado por encima de la temperatura ambiente y que el con-
yor al 70% o menor del 20% y la temperatura antes, durante y d
sea necesario para conservar la temperatura apropiada relativa
en dos selectores para control de temperatura: calor, su aire acondicion
menta por encima del ajuste de la temperatura de enfriamiento
ajuste de la temperatura de enfriamiento, o se activa
tes de que se despierte. La temperatura de la casa puede
despierte. La temperatura de la casa puede entonces di
ed quiera “regresar” la temperatura de su casa en la noche y lue
le que usted quiera “regresar” la temperatura de su casa en
DESEADA O LA TEMPERATURA DEL
DESEADA O LA TEMPERATURA DEL DE
activa si la temperatura del interior aumenta por enc
the text genre we are working in, so it is not possible to make any decisions with-
out checking the accepted expression. To do this a search is made using persona*
to obtain practically all possible forms of this root. Table 2 shows all the concor-
dances obtained.
Of all the possible combinations, clearly the most common is lesiones perso-
nales in the plural. Two verbs that accompany lesiones personales have also been
identified. These are evitar – which coincides semantically with avoid, the verb
accompanying personal injury in the original text – and producir.
Table 2. Concordances for personal*

ra que desea mantener para su comodidad personal.
ra que desea mantener para su comodidad personal. Algun- da hasta el nivel
expulsar el aire viciado comodidad personal al mantener el ventilador encen
la humedad de su casa durante didad personal al mantener el ventilador encen
stemas de cli- drían causar lesiones personales o da–os a la matización d
matización de drían causar lesiones personales o da–os a la
Para evitar lesiones personales o la muerte,
reemplazarlo. Para evitar lesiones personales o la muerte,
. Programe la Para evitar lesiones personales, muerte, o da–os
ntes: Para evitar lesiones personales, muerte, o da–os
dades están interconectadas lesiones personales.
, tal como se indica en la lesiones personales.
o seguros que podrían producir lesiones personales leves o DATOS IMPORTA
o seguros que podrían producir lesiones personales leves o
que podrían producir lesiones personales o la muerte. PRE-
n el que podrían producir lesiones personales o la muerte. PRE- fut
ayor gravedad que producirán lesiones personales severas o la los en la ú
ayor gravedad que producirán lesiones personales severas o la elo y serie
3.2.3 Addressing the reader of the translated text

When a text is going to be translated certain decisions have to be taken, such as
how the author is to address the reader of the target text. Genre conventions of
this kind do not always coincide in different languages and it is worth trying to
compare the differences before starting to translate, because the forms of address
should be maintained consistently throughout the translation. Attending to this
kind of problem will help students developing their protocol to recognise genre
conventions and plan the proper translation strategy.
Information about the conventions can be obtained in a variety of ways, in-
cluding finding out which personal pronouns are used most commonly by check-
ing the list of most frequently used words in the corpus.22 Another method is to
look in the corpus for the most commonly used verb forms. In the Spanish corpus
the most frequent verb forms are debe, ser, deben, puede, están and está. More in-
formation can be obtained from the list by studying the different concordances of
deb* (this includes all the conjugated forms of the verb deber):
22. Teaching materials designed by Patricia Rodríguez and Pilar Sánchez-Gijón for the UAB
Master in Tradumàtica: Translation and Information Technologies and first used in the class on
“Using electronic corpora” in 2003–2004.
Table 3. Examples of deb* concordances ordered to the right

ne filtros reutilizables. sólo se debe realizar este tipo de mantenim
man- Los filtros desechables deben reemplazarse con filtros simi
el ser- Los filtros desechables deben reemplazarse con filtros simi
ctores entre los paneles no se deben sacar hasta elcompletamente
tores entre los paneles no se deben sacar hasta el completamente s
3 deben instalarse en la vista deben ser arreglados para que luzcan
oportes torcidos o abollados deberán ser completamente enderezad
ra yEl sistema de suspensión debe ser de retícula de Te condiciones
ra y El sistema de suspensión debe ser de retícula de Te condicione
motor del ventilador exterior deba ser desconecta- el peor de los c
motor del ventilador exterior deba ser desconecta- resolución de p
ALADOR: ESTE MANUAL DEBE SER ENTREGADO AL USU
ALADOR: ESTE MANUAL DEBE SER ENTREGADO AL USU
a con filtros deshidratadores. Deberá ser examinada y remplazada
vida de su zación de su hogar debe ser inspecciona- unidad. Cons
n y la polea) es importante, debiendo ser inspeccionado antes de l
IL RESOLUCIÓN: su hogar debe ser inspeccionado con fre- ¥ Re
DE FÁCIL RESOLUCIÓN: deben ser inspeccionados en este mo
atura ambiente y que el con- deben ser inspeccionados en este mo
os componentes fallarán y deberán ser limpio instalado, debería n
inua. La humedad relativano debe ser menor del 20% o mayor del
nua. La humedad relativa no debe ser menor del 20%
mente, y en muchos casos deberán ser reemplazados. Lubricación
tros elementos corrosivos deberán ser removidos a fin de prevenir
sistema con acumulador, y debe ser periódicamente reemplazada p
stante de aceite requerida deberá ser vertida en el acumulador o el
n del aceite, el compresor deberá ser volteado con el sello frontal
pie2. Las Tes principales deben soportar el peso de los paneles ad
pie . Las Tes principales deben soportar el peso de los paneles ad
omento de la instalación. Debe tener cuidado al estar instaladas. E
omento de la instalación. Debe tener cuidado alestarinstaladas. Els
x 22. Las Tes principales deben tener intervalos a cada 483. Las T
‘ x 2’. Las Tes principales deben tener intervalos a cada 48?. Las T
las Tes secundarias de 42 deben tener la capacidad de sostener míni
y las Tes secundarias de 4’ deben tener la capacidad de sostener mí
An analysis of the concordances of debe and deben shows the predominance of

the structure DEBER + INFINITIVE, which is used to give instructions when the
agent who is to carry out the instructions does not appear explicitly. Therefore, in
these cases the author does not address the reader directly.
The second most common verb form observed is the infinitive ser. The results
of a more detailed analysis of the concordances of ser (Table 4) show that most of
the structures are MODAL VERB (deber, poder, …) + SER + PARTICIPLE. Once
again, this structure is used to give instructions when the agent who is to carry out
the instructions does not appear explicitly.
Table 4. Concordances of ser
ser suficientes para mantener el sistema funcionando
l ventilador exterior deba ser desconecta- resolución de
l ventilador exterior deba ser desconecta- el peor de los
ación de su hogar debe ser inspecciona- unidad.
OLUCIÓN: su hogar debe ser inspeccionado con fre-
TE MANUAL DEBE SER ENTREGADO AL
TE MANUAL DEBE SER ENTREGADO AL
relativa no debe ser menor del 20%
humedad relativano debe ser menor del 20% o mayor
de suspensión debe ser de retícula de Te
ma de suspensión debe ser de retícula de Te condici
con acumulador, y debe ser periódicamente reemplaza
te y que el con- deben ser inspeccionados en este m
ESOLUCIÓN: deben ser inspeccionados en este
talarse en la vista deben ser arreglados para que luzcan
te, el compresor deberá ser volteado con el sello frontal
deshidratadores. Deberá ser examinada y remplazada, si
aceite requerida deberá ser vertida en el acumulador o e
os o abollados deberán ser completamente enderezado
n muchos casos deberán ser reemplazados. Lubricación
ntos corrosivos deberán ser removidos a fin de prevenir
tes fallarán y deberán ser limpio instalado, debería
s importante, debiendo ser inspeccionado antes de la
antenimiento cuando el ser- Los filtros desechables d
medad que haya podido ser absorbido por el aceite PAG
12 o el R134a podrán ser usados para mantener el d
btener servicio. Podría ser necesario limpiar el serpent
ectores de aire podrían ser muy fríos o muy calientes
de tores de aire podrían ser muy fríos o muy calientes pa
cual no tendrá porqué ser drenada. No obstante, si
l de temperatura puede ser un cuadrante, trabajar du
ol de temperatura puede ser un cuadrante, una Al operar
termostato puede ser PROGRAMABLE o
Su termostato puede ser PROGRAMABLE o NO P
ecificaciones pueden ser encontradas en el Temp
o especial para re- ser suficientes para mantener
Table 5. Conjugated forms of consultar in the corpus

ara el medio ambiente. Consulte a un instalador
adecuada de lubricante, consulte el catálogo de
el reborde inferior. ma. Consulte a su instalador para
– peratura es necesario, consulte a su instalador. biar los
gina de este folleto para consultarlos en el que podrían
na de este folleto para consultarlos en el muerte. A
efactor de gas o petróleo, consulte el La unidad exterior
actor de gas o petróleo, consulte el obtener información
N SOBRE PIEZAS: Consulte a su representante de
RE PIEZAS: Consulte a su representante de
putación. propiedad. Consulte a un instalador calificado,
y seco.consulte las instrucciones de i
Consulte el Manual del usuario de
sistema doble. Si su Consulte el Manual del usuario de
pecciona- unidad. Consulte a su contratista de i
la vida de su unidad. Consulte a su contratista de insta-
So far students may have discovered that the reader is not usually addressed
directly, but they still do not know how the author addresses the reader when the
address is explicit and direct. If they had a list of lemmatised words they could
analyse the concordances of all the conjugated forms of one verb in particular. As
they do not have such a list (since this DIY corpus is not lemmatised), students
can continue down the list of the most frequently used words until they reach the
first conjugated form of a verb that is not a modal verb. The first such verb on the
list is consulte. Given that it is a regular verb all the concordances of consult* may
be extract in order to obtain all the conjugated forms of the verb.
The results (Table 5) show that the only conjugated form of the verb consultar
used in this corpus is consulte, the more formal form of the second person singu-
lar consulte (Usted). This form is used to address the reader, although the Usted
is elided. The other verb forms in the list confirm this level of formality in ad-
dressing the reader. There are a few examples where the Usted is included. The
analysis of the Spanish corpus has revealed that in this textual genre the authors
address the readers formally, though the most common practice is to use imper-
sonal structures with no direct references to the readers.
4. Conclusions
One of the main conclusions that may be drawn from this paper is that students
must have a minimum level of instrumental/documentary competence not only
to practice translation professionally but also to complete their training as transla-

tors. This competence may be acquired or improved through the kind of transla-
tion tasks proposed in this paper.
The use of a DIY corpus as a documentation resource during translator train-
ing is particularly attractive because texts retrieved from the Internet provide a
quick and easy source of documentation. Even though individual texts on their
own may not provide minimum quality guarantees and can therefore not serve
as the basis for translation decisions, translation decisions can be validated by
the corpus as a whole. Moreover, the methodology described allows students to
systematise the manner in which information is obtained, and to develop useful
strategies for solving translation problems. It should not be forgotten that DIY
corpora provide solutions according to the information in the texts they contain.
Thus, it is important to ensure that students’ documentation skills allow them to
retrieve the kinds of texts they need for their translation task. Otherwise they will
not be able to create the DIY corpus that will give them the answers they need.
References
Askehabe, I. 2000. “The Internet for teaching translation”. Perspectives: Studies in Translatology
8, 2: 135–143.
Aston, G. 1999. “Corpus use and learning to translate”. Textus 12, 2: 289–314.
Autermühl, F. 2006. “Training Translators to Localize”. In Translation Technology and its Teach-
ing (with much mention of localization), Pym, A., Perekrestenko, A., Starink, B. (eds). Inter-
cultural Studies Group, Universitat Rovira i Virgili. < http://isg.urv.es/publicity/isg/publi-
cations/technology_2006/index.htm>
Bowker, L. 2002. “Working Together: A Collaborative Approach to DIY Corpora”. In Language
Resources for Translation Work and Research – LREC Workshop #8. <http://mt-archive.
info/LREC-2002-WS-LangResTransl.pdf> 29–32.
Cabré Castellví, M. T. 2001. “Consecuencias metodológicas de la propuesta teórica (I)”. In La
terminología científico-técnica: reconocimiento, análisis y extracción de información formal
y semántica, Cabré Castellví, M. T., Feliu, J. (eds), 19–25. Barcelona: IULA – UPF.
Codina, L. 2002. “Información documental e información digital”. In Manual de Ciencias de la
Documentación, Lopéz-Yepes, J. (ed.), 301–316. Madrid: Pirámide.
Corpas, G. 2001. “Compilación de un corpus ad hoc para la enseñanza de la traducción inversa
especializada”. TRANS. Revista de traductología 5: 155–184.
Faber, P. 2002. “Investigar en Terminología”. In Investigar en Terminología. Interlingua, Faber, P.,
Jiménez Hurtado, C. (eds), 3–23. Granada: Editorial Comares.
Gamero, S. 1998. La traducción de textos técnicos (alemán-español). Géneros y subgéneros. [PhD
Thesis]. Universitat Autònoma de Barcelona.
Hundt, M., Nesselhauf, N., Biewer, C. (eds). 2007. Corpus Linguistics and the Web. Amsterdam/
New York: Rodopi.
Hurtado, A. 2001. Traducción y traductología: introducción a la traductología. Madrid: Cá

tedra.
Lüdeling, A., Evert, S., Baroni, M. 2007. “Using web data for linguistic purposes”. In Corpus Lin-
guistics and the Web, Hundt, M., Nesselhauf, N., Biewer, C. (eds), 7–24. Amsterdam/New
York: Rodopi.
Maia, B. 2003. “Training Translators in Terminology and Information Retrieval using Compa-
rable and Parallel Corpora”. In Corpora in Translator Education, Zanettin, F., Bernardini,
S., Stewart, D. (eds), 43–54. Manchester: St. Jerome.
Mayoral, R. 1997/1998. “La traducción especializada como operación de documentación”.
Sendebar. Boletín de la Facultad de Traducción e Interpretación, 3: 185–192.
Nord, C. 1991. Text analysis in translation. Theory, methodology, and didactic application of a
model for translation-oriented text analysis. Amsterdam/Atlanta: Rodopi.
PACTE 2000. “Acquiring translation competence. Hypotheses and methodological problems of
a research Project”. In Investigating translation. Selected papers from the 4th International
Congress of Translation, Beeby, A. et al. (ed.), 99–106. Amsterdam/Philadelphia: John Ben-
jamins.
Samson, R. 2005. “Computer-Assisted Translation”. In Training for the New Millennium: Peda-
gogies for Translation and Interpreting, Tennent, M., 101–126. Amsterdam/Philadelphia:
John Benjamins.
Sánchez-Gijón, P. 2004. L’ús de corpus en la traducció especialitzada. Barcelona: IULA/UPF
– Dept. de Traducció, Tradumàtica /UAB.
Sánchez-Gijón, P. 2005. “La extracción de conocimiento y terminología a partir de corpus ad
hoc: el uso de documentos digitales de la web pública”. Lingüística Antverpiensia 3: 179–
202.
Scott, M. 1996. Wordsmith Tools. Oxford: Oxford University Press.
Tennent, M. 2005. Training for the New Millennium: Pedagogies for Translation and Interpreting.
Amsterdam – Philadelphia: John Benjamins.
Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Studies in Corpus Linguistics 6. Amster-
dam/Philadephia: John Benjamins.
Varantola, K. 2003. “Translators and Disposable Corpora”. In Corpora in Translator Education,
Zanettin, F., Bernardini, S., Stewart, D. (eds), 55–70. Manchester: St. Jerome Publishing.
Zanettin, F. 2002. “Corpora in Translation Practice”. In Language Resources for Translation Work
and Research – LREC Workshop #8.
<http://mt-archive.info/LREC-2002-WS-LangResTransl.pdf> 10–14.
Zanettin, F., Bernardini, S., Stewart, D. (eds). 2003. Corpora in Translator Education. Manches-
ter: St. Jerome.
Evaluating the process and not just
the product when using corpora
in translator education

Electronic corpora and corpus analysis tools are resources that can improve
the way students acquire translation competence. If, as translator trainers, we
wish to develop our students’ competence to solve translation problems, then
we need to provide them with strategies to use existing resources and tools, to
create new ones and to reap the maximum benefit possible from them. We ad-
vocate a type of training that facilitates the development of students’ strategies,
and attempts to evaluate the acquisition of these strategies.
Our methodological approach is based on translation tasks organised around
learning objectives and includes evaluation of the translation process and prod-
uct. This methodology is student-centred, since it allows the student to be the
focus of the learning process, and comprehensive, in that it takes into account
the objectives and all aspects of the learning context in order to develop appro-
priate materials and evaluation.
We suggest that if one of the learning objectives within a translation course
is to grasp how to use corpora, evaluation of this objective should include the
process and not be limited to the overall quality of the product – the translation.
Examples are given of how the use of corpora and corpus-related software can
be evaluated other than by simply examining the final translation. The results of
some of the students’ own evaluations of the methodology are included.
Key words: evaluation, learning process, corpus use, translator education
1. Introduction
Many changes have occurred in the translation profession over the last few de-
cades in terms of the quantity of texts that need to be translated in a wide variety
of fields, the speed at which translations are required, the diversity of document
130 Patricia Rodríguez Inés
formats used, etc. We sometimes wonder how translators were able to do their
job before electronic resources and tools were widely available. Every phase in
a translation project can be assisted by a series of computer tools and resources,
ranging from specialised search engines to translation memories or spelling and
grammar checkers which contribute to making the process faster and more ac-
curate and to providing a product of higher quality. As for the aids available to
the translator in the documentation phase, electronic corpora and corpus analysis
tools can improve not only the way the profession is practised, but also the way
translation teachers teach and translation students are trained.
Today, it is more and more necessary for translation students to be trained to
be able to cope with a world that is increasingly demanding in terms of IT skills.
We may, however, wonder how translation teachers can be expected to provide
specialized insight into many fields of knowledge and, for example, cope with the
pace at which science and technology develop. Traditionally, teachers were sup-
posed to have an answer for every question, i.e. they were regarded as information
providers. However, given the requirements of the professional market awaiting
translation students, the optimal role a translation educator can play nowadays is
that of an information facilitator, which involves enriching students’ learning pro-
cesses, helping them wherever necessary and, above all, stimulating the develop-
ment of their operative knowledge, i.e. their know how. Translation trainees need
to develop skills to apply to new situations to solve new problems, and to obtain a
certain degree of expert knowledge in the face of the shortest of deadlines.
Translation courses need to be designed to help students achieve these skills.
The first part of this chapter focuses on the advantages of incorporating learning
corpus use to a task-based translation methodology. Translation teaching with
corpora constitutes a step forward in relation to traditional translation teaching,
since the use of corpora reduces the prominence of the teacher’s intuition in the
classroom and increases the importance of the student, as well as that of the cor-
pus as a documentary resource. The second part of the chapter is concerned with
evaluating this methodology. Examples are given of how the use of corpora and
corpus-related software can be evaluated other than by simply examining the final
translation. The results of some of the students’ own evaluations of the methodol-
ogy are included.
2. Theoretical background
With technological developments revolutionising the translation profession,

translator trainers should draw on the new pedagogical approaches that are avail-
able in order to train translators for the 21st century. We would like to see a shift
The learning process in the use of corpora in translator education 131
Translator education based on teacher intuition
Introduction of corpus use to translator education
Translator education that includes learning corpus use in a student-

centred, task-based methodology to develop translation competence
Figure 1. Evolution in translator education
from a methodology that relies almost completely on the teacher’s knowledge and
experience to a methodology that focuses on the students’ needs (for instance,
selection and organisation of materials on the basis of the student’s needs analy-
sis, student self-evaluation, etc.). In reality, however, these changes have not ac-
tually been implemented in many translation courses. Therefore, reform is still
necessary in some centres if translation teaching is to evolve from a methodology
based on the teacher’s intuition to a corpus-based approach that gives the student
greater responsibility. In our teaching proposal, learning how to use corpora to
translate is a learning objective and is linked to translation competence. Compe-
tence is understood as the interaction between knowledge, skills and attitudes for
the purpose of carrying out a task in an appropriate way (see Figure 1).
We are in the middle of the Bologna process, which promotes competence-
based learning within the European Space for Higher Education. These compe-
tences should be applicable to the professional world. Corpus-based work offers a
wide range of possibilities in the translation classroom and can be easily adapted
to competence-based training, focusing on “learning how to learn” and profes-
sional requirements. Work with corpora is based on activities that involve search-
ing and analysing data, and therefore strengthens the sense of learning through
discovery, as well as through reorganising and building upon previous knowledge.
Furthermore, corpus-based learning includes the possibility of working coopera-
tively or in a highly autonomous way.
Translation teaching has a long history. However, teaching professional trans-
lation in European universities began less than 50 years ago. Our proposal for the
use of corpora for translating with students has been inspired by three important
contributions to written translation teaching, made in the last thirty years:
1. Teaching based on learning objectives.

2. Constructivism.
3. The translation task-based approach.
Delisle’s L’analyse du discours comme méthode de traduction (1980) marks the be-
ginning of the incorporation of a well-defined methodology based on learning
objectives into translation teaching. In his 1993 publication, he clearly establishes
the elements that a teaching method needs to take into account:
Une méthode d’enseignement doit clairement délimiter la matière à transmettre,
sérier les difficultés, fixer des objectifs d’apprentissage, préciser les moyens per-
mettant de les atteindre, établir une progresion dans la formation et, en fin,
prévoir des modalités d’évaluation des performances observables.
(Delisle 1993: 15)
Kiraly, in A social constructivist approach to translator education (2000), advocates

the development of a student-based teaching methodology for translation and
interpreting trainees, and expresses dissatisfaction with the lack of a systematic
pedagogy of translation. He feels that translation teaching at the time to which
his book refers had several shortcomings, such as teachers’ self-image as the focal
point of classes and their acceptance of a passive role for students; the absence
of distinction between the different components of translation competence; and
scholars’ failure to criticise past teaching styles. The proposal that is partially set
out here deals with some of the deficiencies identified by Kiraly in his social con-
structivist approach to translation teaching.
Finally, Hurtado’s (1992) translation task-based teaching approach, which is
organised around learning objectives, gathers positive contributions made by the
literature produced previously to design a pedagogical method that is student-
centred and translation process-oriented, and which allows for the integration of
all the elements involved in the learning process, including evaluation. González
Davies’ work (2003) is another fundamental reference as regards applying a task-
based approach to translation teaching, especially within a social constructivist
approach; she designs translation tasks that involve a high degree of interactivity
and enhance the learner’s autonomy.
As far as CULT is concerned, the aforementioned approaches entail a range
of benefits. Firstly, clearly established learning objectives are required in order
to orientate the student’s learning process. Learning objectives or competences
are the starting and finishing points in a learning journey. Secondly, constructiv-
ism supplies the key concept of knowledge restructuring, an idea that is perfectly
suited to work with corpora, as such work requires a process of formulating hy-
potheses and queries, analysing data, coming to conclusions and, possibly, refor-
mulating initial hypotheses or queries. The use of authentic materials and tasks,
as advocated by constructivism, is also highly appropriate for corpus work, since
corpora contain real texts and the activities carried out with them resemble real-
life activities that professional translators perform when consulting parallel texts
or other types of documents. Thirdly, the task-based approach has been success-
fully applied to areas such as language teaching for a long time and has proved to
be very helpful in terms of planning and organising work. Lastly, the task-based
approach stresses the idea of learning through use, i.e. learning through experi-
ence and practice, another fundamental concept of corpus work, given that such
work involves both declarative knowledge (know what) and, most importantly,
operative knowledge (know how, applicable to building and interrogating a cor-
pus, extracting and interpreting relevant data, etc.). In short, learning to translate
and the use of corpora can be combined within a single realistic translation task.
Translation teaching that only draws on the teacher’s experience places com-
plete responsibility for what the students learn on the teacher and his/her intu-
ition about what is right or wrong, common or uncommon. Corpora can provide
alternative sources of authority and have recently been introduced to the world
of translation teaching with a view to providing empirical data and authentic ma-
terial within the classroom. Quantity and quality have also been enhanced, the
former by virtue of there being more texts to consult during the documentation
phase and the latter in the form of translations that, more than ever, resemble
original texts in the target language, avoiding “translationese”. However, despite
the great advantages of using electronic corpora for translation teaching, the ex-
amples included in the literature tend to be rather anecdotal and a well-founded
methodology remains to be established. Corpus linguistics does have its own
methodology, which is based on a process of observation, analysis and generalisa-
tion, but a methodology for translation teaching with corpora has not yet been
explored in depth. We have started to work in this field and are now testing some
materials with students. These materials follow a task-based approach that uses a
methodology organised around systematically, coherently and comprehensively
designed and structured learning objectives, tasks and teaching units.
For those who are not familiar with the task-based approach, tasks are the ba-
sic organisational units of the learning process and they make up larger structures
called teaching units. A learning objective is, in simple terms, what we as teachers
expect our students to achieve, either as regards a single task or a whole teach-
ing unit (Delisle 1993, 1998). Several items need to be taken into consideration
to build a teaching unit, such as the learning context and students’ level of pro-
ficiency. It is vital to assign learning objectives to the unit and to organise it into
tasks. Each task should have its own learning objective(s), a detailed explanation
of what the student has to do, a list of the materials required to carry out the task,
and a description of the evaluation to be applied. In other words, the task-based
approach allows the teacher to organise every aspect of the teaching situation that
he/she has to take into account (what to teach, how to teach, when to teach and
how to evaluate the results of the learning process).
3. Pedagogical framework and proposal
We have designed a full proposal for the use of electronic corpora in translation
teaching for different levels of proficiency in translation and corpus use. This full
proposal, which takes students’ previous knowledge and their learning context
into account, includes learning objectives related to a competence, teaching units
and a proposal for the evaluation of the use of corpora (Rodríguez Inés 2008).
However, due to space limitations, our proposal is only sketched out here.
3.1 Competences and learning objectives
Learning objectives are of paramount importance in the design of any well-

founded pedagogical proposal. They can be regarded as a teaching guide and as
a reference for evaluation purposes. In general, such objectives are related to the
sub-competences that make up translation competence. Our proposal is based on
the PACTE translation competence model below (see Figure 2).
In this proposal, the learning objectives envisaged for the use of corpora to
translate are related to the instrumental sub-competence, which covers the use of
documentary resources and IT tools applied to translation, i.e. specific software,
document and information management, a critical eye for documentary resourc-
es, skills for analysing, interpreting and extrapolating data, etc.
Although some authors do not make learning objectives explicit where the
use of corpora is concerned, or do not refer to the sub-competences that the stu-
Figure 2. PACTE’s Model of Translation Competence (PACTE 2003)

dent is expected to acquire, they do express themselves in other terms. For ex-
ample, Aston says that “to use corpora effectively in increasingly independent
research, learners need technical, methodological, and conceptual knowledge and
abilities” (Aston 2001: 24) and Zanettin comments that “... translators need to be
able to see patterns and regularities both within a language and across languages”
(Zanettin 1994: 109), while Varantola (2003: 69) lists the competences required to
be able to use corpora in translation, making a distinction between two groups
of categories.
Corpus compilation
• Corpus design and design criteria.
• Search strategies and search word selection.
• Source criticism to assess the reliability of corpus texts.
• Assessment of corpus adequacy and relevancy.
• Software literacy in general.
• Selection of Internet search engines.
• Integrated use of word processing tools and corpus tools.
Use of corpus information
• Deductive corpus analysis skills in general.
• Use of preliminary corpus information for more targeted compilation criteria.
• Use of corpus evidence for translational decisions.
• Corpus evaluation and decision-making skills.
• Distinctions between permanent corpus collections and targeted, disposable corpora.
• Overall corpus knowledge management skills.
The European Space for Higher Education is encouraging the adoption of a

competence-based approach to training in which competences are seen as the
building blocks within a curriculum. Within competence-based training there are
general competences (i.e. those that should be present in all disciplines) and spe-
cific competences (i.e. those that belong to a particular discipline and therefore
define a given profile). Competences specific to the discipline of translation and
interpreting are still under discussion.
To the best of our knowledge, proposals regarding the use of corpora to solve
translation problems as a specific competence within a competence-based train-
ing approach have not yet been made. Likewise, the elements defining this specific
competence and which are key to evaluation have yet to be listed. Our proposal
of one specific competence in relation to the use of corpora in translator training
is presented and briefly explained here. An example of a teaching unit is given to
illustrate the elements constituting that specific competence, the learning objec-
tives, tasks and evaluation procedures.
Curriculum design for university degrees should take into account a number
of issues, ranging from formal considerations, such as the number and distribu-
tion of credits required to obtain a degree, to contextual issues, such as students’
profiles and previous training, market expectations, etc. As mentioned previously,
specific competences should be established for degrees in translation and inter-
preting, so that each educational centre interested in following a competence-
based approach would have a catalogue or a list of competences specific to its
discipline and on the basis of which it could work. Bearing all this in mind, we
would like to suggest one specific instrumental sub-competence, namely the abil-
ity to use electronic corpora adequately in order to solve translation problems in an
appropriate manner.
This specific sub-competence is composed of four elements:
1. To assimilate basic principles involved in working with corpora.

2. To build corpora.
3. To handle corpus-related software.
4. To use corpora to solve translation problems.
We advocate the use of corpora in translation classes at an early stage of the learn-
ing process, and have established two phases in the pedagogical progression in-
volved in using corpora to translate, namely an introductory phase and a consoli-
dation phase. In the introductory phase, students acquire basic methodological
and technical principles where corpus work is concerned, while the consolida-
tion phase is geared to enabling them to reap the maximum benefit possible from
corpora in translation. The four elements mentioned above are then distributed
between these two phases in the form of general and specific learning objectives,
i.e. statements that describe the results expected after a learning process has taken
place (with objectives of the former variety being less observable than those of
the latter).
The overall aim of the introductory phase is for the student to develop an
understanding of certain basic methodological principles related to the use of
corpora for translation purposes, and to be able to work with corpora at a basic
level. The following three general learning objectives have been defined for the
introductory phase:
1. To identify basic principles of using corpora for translation.

2. To use basic functions of corpus-related software.
3. To use corpora at a basic level to solve translation problems.
There should be a certain degree of flexibility as regards the order in which

the above general objectives are accomplished. While the acquisition of basic
rinciples related to corpus work necessarily constitutes the first stage, it is also
p
true that the student does not need to assimilate all those principles to be able to
start using corpora in translation. Meanwhile, the student can become familiar
with corpus-related software as part of the process of acquiring some of the afore-
mentioned principles.
The overall aim of the consolidation phase in the use of corpora to translate is
for the student to acquire more advanced knowledge and more specialised abili-
ties with regard to the use of corpora, so as to make the most of such resources
when translating. The following three general learning objectives have been de-
fined for the consolidation phase:
1. To build corpora.
2. To use advanced functions of corpus-related software.
3. To use corpora at an advanced level to solve translation problems.
In order to be operational, general learning objectives need to be made explicit,

i.e. expressed in observable terms. Each of the aforementioned general objectives
has been broken down into various specific objectives, only some of which are
presented here.
3.2 Teaching units
We have designed several teaching units with the aim of covering most of the
proposed learning objectives. The contents related to these learning objectives
are varied and include the compilation, use and evaluation of different types of
corpora; the use of basic and advanced functions of corpus analysis software; and
the use of corpora to solve translation problems of different kinds, with vary-
ing levels of difficulty and corresponding to a range of fields of specialisation.
A wide range of resources and materials have been used, including dictionaries,
glossaries, thesauri, comparable and parallel corpora, texts of different natures
and genres, and software such as corpus analysers, text aligners, web download-
ers and off-line browsers. Task activities are also varied, with students working
individually, in pairs and in groups; likewise, evaluation is either through self-
assessment, peer-assessment or undertaken by the teacher. Relevance, variety
and progression in terms of difficulty levels have been the main criteria on which
decisions have been based regarding the design of teaching units, the selection of
materials, how to use these materials and how to evaluate the student’s learning
process.
An example of these units is “Ingredients for my corpus: quality texts”.
teaching unit
“Ingredients for my corpus: quality texts”
difficulty level
Consolidation
subject
Specialised translation (Spanish-English)
learning objectives
– To build corpora
– To develop a critical attitude towards parallel texts
– To build an ad hoc bilingual comparable corpus
– To use advanced functions of corpus-related software
– To extract concordances: simple, using truncated searches, using context words
(refresher)
– To find the most appropriate sorting method within a concordance set (refresher)
– To re-sort the context (refresher)
– To download texts from the Internet
– To extract and interpret collocations
– To extract and interpret clusters
– To use corpora at an advanced level to solve translation problems
– To select the appropriate corpus/corpora according to translation needs (refresher)
– To use co-textual information from concordances
structure
– Task 1: Identifying potential translation problems in my source text
– Task 2: Analysis of documentation needs
– Task 3: Internet text quality evaluation
– Task 4: Exploring the possibilities of the resource I have created
– Final task: Using an ad hoc corpus to solve translation problems
As can be seen, the teaching unit selected belongs to the consolidation phase
and focuses on learning how to build and use corpora. In this case, it also includes
re-examining learning objectives already dealt with in a previous teaching unit
corresponding to the introductory phase, with a view to emphasising their im-
portance. The unit was organised into five tasks. The student’s use of corpora was
evaluated in the final task, i.e. that in which the ability to achieve several objec-
tives was tested.
3.3 Evaluation
We will now focus on evaluation and the changes entailed by the use of technical
resources in teaching, as regards what is evaluated and how. To that end, we will
describe the experience we have gained from teaching translation with corpora
and evaluating students’ performance.
In our concept of evaluation, it is not only the translation product that should
be assessed, but also the process in which the students use the resources (corpora)
available. This notion is essential in formative evaluation, which is a source of
information that can help improve translation teaching and learning. In other
words, the purpose of evaluation is not just to assign or receive marks, but for
teachers and students to learn about the learning process and progress. Where
translation is concerned, both the product and the process can be evaluated; it is
simply a case of using different instruments for each aspect. The instruments (us-
ing the term in a broad sense) that we developed to evaluate students’ learning of
the use of corpora throughout the semester were the following:
– An observation chart for the teacher to record the students’ observable prog-
ress.
– A learning diary for the students to keep a record of what they feel they have
learned.
– A self-evaluation questionnaire for the students to assess their own learning
at the end of every teaching unit.
– A questionnaire for the students to comment on the contents and methodo
logy used at the end of every teaching unit.
Other instruments were designed to test the way corpora are used by students
when translating. These instruments were created to be used in the teaching units
in which, by way of a final task, students are asked to translate a text using corpora
only. The instruments in question are the following:
– A source text to be translated.

– A list of corpus-related items to be assessed.
– A questionnaire on the steps taken by the student when using the corpora
provided in order to solve certain translation problems.
– An evaluation scale for appraising the use of corpora to translate, to be used
in conjunction with a chart in which the data obtained from every student’s
translated text is combined with that arising from their questionnaire on the
steps taken.
This paper will subsequently focus on these four instruments and how they were
used. Before doing so, however, we will describe our procedure for preparing the
evaluation of the use of corpora to translate:
1. The learning objectives to be evaluated were selected.

2. The items to be assessed (i.e. those matching the chosen learning objectives)
were selected.
3. A percentage of the overall mark was assigned to each item on the basis of the
level of proficiency expected of the students.
4. The source text to be translated was selected.
5. A limited number of translation problems that could be solved by using cor-
pora were selected from the source text, and the questionnaire was prepared
accordingly.
6. The students were provided with the source text, a predetermined set of cor-
pora and the questionnaire on preselected translation problems.
3.3.1 Selected learning objectives

Although this teaching unit covered several learning objectives, only a few were
chosen to be evaluated in the final task. These were:
– To use advanced functions of corpus-related software

– To extract concordances: simple, using truncated searches, using context
words (refresher)
– To find the most appropriate sorting method within a concordance set
(refresher)
– To re-sort the context (refresher)
– To extract and interpret collocations
– To extract and interpret clusters
– To use corpora at an advanced level to solve translation problems
– To select the appropriate corpus/corpora according to translation needs
(refresher)
– To use co-textual information from concordances
3.3.2 Item list

A list of items liable to be evaluated in relation to the use of corpora for translation
was drawn up for the full proposal. This list is open and items are displayed in no
particular order.
– Good practices in relation to building ad hoc corpora.

– Good practices in relation to naming the files to be included in a corpus.
– Appropriateness of the corpus/corpora selected.
– Appropriateness of the software used.
– Appropriateness of the search string entered.
– Appropriateness of the search restrictions applied.
– Appropriateness of the sorting of results.
– Quality in terms of text alignment.
– Appropriateness of the use of available software functions (extraction of col-

locates, clusters, plot, keywords, etc.).
– Appropriateness of the use of annotation.
– Acceptability of the equivalent proposed.
– Etc.
It should be stressed that each item evaluated in a task is directly related to a

learning objective. For example, the learning objective of a given task within a
teaching unit may consist of identifying the importance of looking on both sides
of a keyword or term in order to extract conceptual or collocational information.
In that case, the related item to be assessed would be the appropriateness of the
sorting of results (alphabetical reordering of the words surrounding the keyword)
(see Table 1).
Table 1. Example of correlation between a learning objective and the item to be assessed
Learning objective Item to be assessed
To identify the importance of looking on both sides of a keyword or Appropriateness of the
term in order to extract conceptual or collocational information sorting of results
In the teaching unit being used here as an example, the items selected were the
following:
1. Appropriateness of the corpus/corpora selected.

2. Appropriateness of the search string entered.
3. Appropriateness of the search restrictions applied.
4. Appropriateness of the sorting of results.
5. Appropriateness of the use of available software functions (extraction of col-
locates, clusters, plot, keywords, etc.).
6. Acceptability of the equivalent proposed.
3.3.3 Source text

The source text chosen to be translated and to serve as a basis for the evaluation
of the use of corpora as a documentary resource was the web page on veterinary
dermatology shown below. Some of the translation problems selected have been
indicated (see Figure 3).
3.3.4 Questionnaire
A questionnaire was designed to test the way corpora are used by students at cer-
tain points of the education process in order to evaluate their acquisition of com-
petences and progress in using corpora for translation purposes. A questionnaire
Translate the following text into English. This text is part of a website (http://www.dermovet.com) that offers information and services on veterinary
dermatology to a non-expert or semi-expert readership. The translated text will have the same type of target readers and function as the source text.
Terminología Dermatológica Veterinaria / Centros Veterinarios / Pruebas diagnósticas / Novedades Terapéuticas / Visita Dermatológica
Mascotas...
Terminología dermatológica veterinaria

Lesiones Primarias
Lesiones Secundarias
Lesiones Primarias o Secundarias
Mácula: Lesión caracterizada por un cambio de color de la piel, sin elevación ni engrosamiento de la misma. Son focales, bien circunscritas y
de un tamaño inferior a 1 cm. Existen varios tipos: eritematosa (atopía), hiperpigmentada (lentigo), hipopigmentada (vitíligo), hemorrágica
(intoxicación, reacción a fármaco).
¿Qué le sucede a mi perro? Mancha: Lesión idéntica a la mácula pero de un diámetro mayor a 1 cm, y suelen ser menos bien delimitadas.
Pápula: Área cutánea con relieve, sólida y circunscrita de hasta 1 cm de diámetro. Puede ser folicular o interfolicular. Son indicativas de pioderma
y/o parasitosis, en la mayoría de casos.
Eritema: Enrojecimiento de la piel debido a la vasodilatación de los vasos dérmicos superficiales. Indica inflamación cutánea.
Placa: Lesión aplanada mayor de 1 cm de diámetro, la mayoría de veces debido a la unión de varias pápulas. Son indicativas de pioderma y menos
frecuentemente de enfermedades autoinmunes.
Pústula: Elevaciones bien delimitadas de los estratos superficiales de la epidermis, normalmente con contenido purulento. Hay de varios tipos:
sépticas (piodermas), estériles (Pénfigo Foliáceo), eosinofílicas (alergias, parasitosis), linfocíticas (Linfoma Epiteliotrópico). Pueden ser foliculares
o interfoliculares.
Vesícula: Lesión similar a la pústula pero con contenido seroso o con exudado inflamatorio y de diámetro inferior a 1 cm de diámetro. Se origina
por un acúmulo de fluido en los espacios intercelulares que aparece y desaparece en minutos u horas.
Bulla: Vesícula mayor de 1 cm de diámetro.
Habón: Lesión con relieve consistente en un edema intercelular de las células de la epidermis. Típica de las reacciones de hipersensibilidad tipo I.
¿Qué le sucede a mi gato? Es la lesión principal en reacciones de urticaria y en las reacciones positivas al Skin-Test.
Nódulo: Lesión elevada mayor de 1 cm de diámetro, bien delimitada y sólida. Suelen estar bien infiltradas en la dermis. Típico de neoplasias y
Paniculitis Nodular Estéril.
Tumor: Masas neoplásicas tanto benignas como malignas. Se utiliza cuando hay nódulos muy grandes.
Quiste: Cavidades forradas por epitelio localizadas en el interior de la piel y normalmente con contenido glandular.
Figure 3. Source text to be translated

Table 2. Questionnaire to be filled in by the student with regard to his/her use
of corpora to solve a translation problem
problem 3 (Terminology): “habón”
question answer
What word or string of words or characters have you searched for? In search 1: corpus:
which corpus/corpora have you searched? search 2: corpus:
search 3: corpus:
Have you restricted your search in any way (regional variant, oral or
written mode, date, etc.)?
Have you re-sorted the concordances alphabetically?
Have you re-sorted the concordances alphabetically to the right or to
the left of the keyword? Specify the sorting criterion used (e.g. L1, R1).
State whether you have used other functions from WordSmith Tools
(Grow, Shrink, Clusters, Collocates, etc.).
Write down the various translation solutions you had considered before
making your final choice.
Justify your solution.
solution suggested by the student: “your solution”
containing five preselected translation problems of different kinds was prepared

in order to obtain information on the steps taken by the students to solve the
aforementioned problems. Knowing exactly what corpora and software tools
were available to the students in the final task, and in the knowledge that no other
documentary resources were to be consulted, the teacher was able to prepare very
specific questions related to the learning objectives that the students were sup-
posed to have achieved during the teaching unit. Assessing the students’ transla-
tions and their answers to the questions therefore gave direct information on the
actual achievement of the learning objectives in question. The following example
is taken from the questionnaire prepared for the evaluation of the use of corpora
to solve a terminological problem in the source text (see Table 2).
As can be seen, the questionnaire contains questions related to the previously
selected items, which, in turn, are related to the previously selected learning ob-
jectives. In the case of the example referred to earlier (see Table 2), the questions
would be as shown below (see Table 3).
In the interests of data manageability, only five translation problems were
studied. The translation problems were selected on the basis of variety and the
possibility of solving them using the corpora provided. While it may have been
interesting to refrain from preselecting translation problems and to observe the
Table 3. Example of correlation between elements within the evaluation proposal
contained herein
Learning objective Item to be Question/s
assessed (from questionnaire)
To identify the importance Appropriateness of – Have you re-sorted the concordances
of looking on both sides of the sorting of results alphabetically?
a keyword or term in order – Have you re-sorted the concordances
to extract conceptual or col- alphabetically to the right or to the
locational information left of the keyword? Specify the
sorting criterion used (e.g. L1, R1)
real problems the students needed to solve with the help of corpora, doing so
would have constituted a completely different study requiring a different perspec-
tive. In our case, we were interested in seeing how the students applied what they
had learned to using the corpora available (if they used them at all) for the pur-
pose of solving various problems (the same problems for every student).
3.3.5 Evaluation scale

We developed an evaluation scale based on the list of items referred to earlier, i.e.
items liable to be evaluated in the process of translating a text using corpora. This
scale ranges from 0 to 2 points, which are assigned on the basis of whether each
step taken by the student when using a corpus to solve a translation problem was
incorrect (0), improvable (1) or correct (2). It should be noted that the evaluation
scale is not applied to the item “acceptability of the equivalent proposed”, which is
always assessed using the values “right” (√) or “wrong” (X).
The data arising from the questionnaire and from the translations produced
by the students are placed in the following chart (see Table 4).
The evaluation scale can be applied once the items to be evaluated have been
selected and assigned a percentage value, and the problems to be studied chosen
from the source text (see Figure 3). As stated previously, the sources of the data
entered in the chart are the translated text and, above all, the students’ answers
to each translation problem in the questionnaire. The questionnaire constitutes a
record of what each student has done with the corpora and the related software
provided, enabling us to assess whether the procedure followed has been correct
or incorrect, or might be improved. The evaluation scale is based on the idea that
taking the right steps should lead to the right solution (or one of the possible right
solutions). It is for this reason that the equivalent proposed is evaluated as a sepa-
rate category and deemed to be correct or incorrect, thus serving as a control vari-
able. The following example illustrates certain prototypical cases (see Table 5).
Table 4. Chart for combining the data extracted from each student’s translation
and questionnaire
Student A Problem Problem Problem Problem Problem PARTIAL FINAL MARK
1 2 3 4 5 MARK (out of 10 after
(after appli- application
cation of %) of %)
Item 1 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 2 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 3 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 4 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Item 5 0/1/2 0/1/2 0/1/2 0/1/2 0/1/2
(value X %)
Acceptability √/X √/X √/X √/X √/X
Table 5. Example of a completed chart, combining data from a student’s translation
results and questionnaire
Key:
Problem 1: ¿Qué le sucede a mi...?
Problem 2: Son indicativas de pioderma... / Es indicativo de Pénfigo Foliáceo...
Problem 3: Habón
Problem 4: Tumor: masas neoplásicas tanto benignas como malignas. Se utiliza cuando hay
nódulos muy grandes.
Problem 5: estrato córneo / basal
Item 1: Appropriateness of the corpus/corpora selected
Item 2: Appropriateness of the search string entered
Item 3: Appropriateness of the search restrictions applied
Item 4: Appropriateness of the sorting of results
Item 5: Appropriateness of the use of available software functions
0: Incorrect
1: Improvable
2: Correct
Acceptability (of the equivalent proposed)
√: Right
X: Wrong
Table 5 (continued)
Student A Problem Problem Problem Problem Problem PARTIAL FINAL MARK
1 2 3 4 5 MARK (out of 10 after
application of %)
Item 1 2 2 2 2 8/8 = 1 9.2
(value 20%)
Item 2 2 2 2 2 8/8 = 1
(value 20%)
Item 3 2 0 2 4/6 = 0.6
(value 20%)
Item 4 2 2 2 6/6 = 1
(value 20%)
Item 5 2 2 2 2 6/6 = 1
(value 20%)
Acceptability √ √ √ X √
As can be seen in the table where the resolution of Problem 1 is concerned,

the fact that the student did not use the corpora provided is not penalised in any
way, due to the fact that an appropriate solution was obtained (i.e. the empty
cells represent steps that were not taken, but their omission is not penalised).
The same applies in the case of Problem 3, where the student omitted some of
the possible steps but still obtained an appropriate solution. In the cases of Prob-
lem 2 and Problem 5, the student followed all the steps and, likewise, obtained
a correct solution. As for Problem 4, looking at the last item (Acceptability of
the equivalent proposed) tells us that the solution provided by the student was
wrong, resulting in a score of 0 in each of the empty cells in the column in ques-
tion. In this case, we deduce that the omission of a step, namely the sorting of
results, led the student to produce an incorrect answer. Our interpretation of the
data is that had the student sorted the concordances extracted from the corpus,
he/she would have been able to extract relevant information that would have
led him/her to the right answer. We can also interpret this data as meaning that,
in all likelihood, the student in question needs further training in the sorting
of results, as well as that the importance of looking at the keyword’s immediate
context should be emphasised by means of new tasks designed to accomplish the
corresponding learning objective.
4. Results
The teaching unit presented here was tested on a group of 26 final-year Spanish
students taking a course in specialised translation into English. At the end of this
unit, students were asked to fill in 2 questionnaires. The aim of the first was for
them to rate, on a scale of 1 to 10, their acquisition of the competences to which
the unit was geared and the benefits of using corpora to translate. The purpose of
the second was to collect their opinions and comments on their level of satisfac-
tion with the contents of the unit and the methodology used. The results from the
first questionnaire showed that most students were confident that they had ac-
quired the competences to which the unit was geared. For example, the students’
average rating for premises such as “I am able to use a monolingual comparable
corpus in order to translate a text” and “Using corpora has helped me to feel
more confident about my translation solutions” were 7.9 and 8.3 respectively. The
students’ evaluation of the unit’s contents and methodology was also positive, for
example, they stressed that using corpora enabled them to save time when trans-
lating. While collecting quality texts to build a corpus may take some time, the
resulting resource helps to solve many of the translation problems that arise. Fur-
thermore, students said that learning how to use the program WordSmith Tools
helped them a great deal in terms of reaping the full benefits of parallel texts (i.e.
finding terms using cotextual information, appropriate use of these terms, use of
authentic English syntax, etc.).
5. Conclusions
This paper has asserted the need for evolution and the adoption of new method-
ologies in translation teaching. As stated, working with corpora brings authentic
material and empirical data to language and translation research and teaching.
More than ever, the use of corpora has made it possible to focus on the student,
the translation task and the resources used, rather than on the teacher. Further-
more, corpora (as resources) and corpus linguistics (as a methodology and a new
way of approaching language work) promote a sense of discovery that increases
motivation and student autonomy, in addition to encouraging the use of IT tools
and the processing of information in electronic format.
Looking ahead, a task-based approach that revolves around learning ob-
jectives linked to competences can provide a methodological framework for
teaching translation with corpora, i.e. a teaching method that is systematic and
comprehensive in that it allows for the integration of all the elements involved in
education, making the processes of teaching and learning more coherent. Learn-
ing corpus use to translate is not just about teaching/learning how to use tools,
but also about following a methodology that makes the process more systematic.
As stated previously in this paper, if using tools is a learning objective, this learn-
ing should be evaluated.
Despite the fact that the type of evaluation suggested here needs to be per-
fected, it is also true that, with certain limitations, it can provide information about
the process of using a corpus and the origin of a student’s translation errors, as well
as data that can be used to improve teaching/learning, given that it helps to reveal
where a learning objective has yet to be fully achieved. If this is the case, the teacher
can then go back and modify the task corresponding to the learning objective in
question, or even design new tasks to make sure that the objective is fulfilled.
References
Aston, G. 2001. “Learning with corpora: An overview”. In Learning with corpora, G. Aston
(ed.), 7–45. Bologna: CLUEB.
Beeby, A. 1996 Teaching Translation from Spanish to English. Ottawa: University of Ottawa
Press.
Bowker, L. 1999. “Using a corpus to assess student translations: A pilot study”. In PALC’99:
Practical Applications in Language Corpora. Papers from the International Conference at
the University of Lódz, 15–18 April 1999, B. Lewandowska-Tomaszczyk and P. James Melia
(eds), 529–540. Bern: Peter Lang.
Delisle, J. 1980. L’analyse du discours comme méthode de traduction. Cahiers de Traductologie
2. Université d’Ottawa.
Delisle, J. 1993. La traduction raisonnée. Manuel d’initiation à la traduction professionnelle
de l’anglais vers le français. Col. Pédagogie de la traduction. Les Presses de l’Université
d’Ottawa.
Delisle, J. 1998. “Définition, rédaction et utilité des objectifs d’apprentissage en enseignement
de la traduction”. In Los estudios de traducción: un reto didáctico, I. García Izquierdo and
J. Verdegal (eds). Col. Estudis sobre la traducció 5. Universitat Jaume I.
González Davies, M. 2003. (coord.). Secuencias. Tareas para el aprendizaje interactivo de la
traducción especializada. Barcelona: Octaedro.
Hurtado, A. 1992. “Didactique de la traduction des textes spécialisés”. In Actes de la 3ème
Journée ERLA-GLAT. Lexique spécialisé et didactique des langues 9–21. Brest: UBO-ENST.
Hurtado, A. 1999. Enseñar a traducir. Madrid: Edelsa.
Hurtado, A. 2007. “Competence-based curriculum design for training translators”. In The Inter-
preter and Translator Trainer (ITT). Vol. 1(2): 163–195.
Kiraly, D. 2000. A social constructivist approach to translator education. Manchester: St. Je-
rome.
PACTE. 2003. “Building a Translation Competence Model”. In Triangulating Translation: Per-
spectives in process oriented research, F. Alves (ed.), 43–66. Amsterdam: John Benjamins.
Rodríguez Inés, P. 2008. Uso de corpus electrónicos en la formación de traductores (inglés-espa-
ñol-inglés). PhD thesis. Departament de Traducció i d’Interpretació. Universitat Autònoma
de Barcelona.
Varantola, K. 2003. “Translators and disposable corpora”. In Corpora in translator education,

S. Bernardini, D. Stewart and F. Zanettin (eds), 55–70. Manchester: St. Jerome.
Zanettin, F. 1994. “Parallel Words: Designing a Bilingual Database for Translation Activities”. In
UCREL Technical Papers, Vol. 4: Corpora in Language Education and Research: A Selection
of Papers from TALC’94, A. Wilson and T. McEnery (eds), 99–111.
Subject index
A documentation resource 93, P

adequate translation 16, 25, 45 115, 130 parallel corpus 10
documentation skills 111 parallel texts 113, 147
C domain-specific competence pedagogy 12, 22, 32, 132
collocate 19, 31, 32, 40, 67, 68, 110
102, 113, 141 Q
communicative competence 13 E quantitative data 21, 51, 57
comparable corpus 10, 21, equivalence 19, 33, 52, 72, 141
32, 51, 60, 64, 65, 71, 93, 94, European Space for Higher S
103, 137 Education 131, 135 search engines 83, 116, 130
concordance 10, 18, 20, 21, 37, evaluation of corpus 88 semantic prosody 31, 32, 40, 66
43, 56, 100, 122 explicitation 48 specialised translation 78, 109,
constructivism 12, 132 110, 146
corpus-based and driven 40 G
corpus ad hoc 10, 16, 78, 80, genre conventions 122 T
115, 140 teaching methodology 13, 33,
corpus design 88 I 132, 147
corpus in translation teaching instrumental competence 110, teaching unit 13, 133, 137
9, 10, 30, 78, 133, 136 134, 136 terminological competence 110
corpus linguistics methodology intuition 39, 40, 64, 72, 120, 130 translation competence 10,
40, 116, 133 102, 109, 111, 131, 132, 134
creativity 19, 68 L translation profession 12, 111,
CULT 45, 132 language teaching 32, 77, 133 115, 130, 131
cultural competence 110 learner corpus 16 translation teaching 32, 92,
literary translation 12, 51, 62 130–134, 138, 147
D translation technologies 112,
discourse marker 18 M 130
disposable corpus 10, 115 metaphor 35, 60, 63 translation universals 47
do-it-yourself corpus 115 translator education 47, 57, 131
documentation competence N translator training 9–12, 16, 22,
102 native-speaker 39, 60 77, 102, 109–114, 119, 126, 135
Benjamins Translation Library
A complete list of titles in this series can be found on www.benjamins.com
84 Monacelli, Claudia: Self-Preservation in Simultaneous Interpreting. Surviving the role.

xxi, 178 pp. + index. Expected April 2009
83 Torikai, Kumiko: Voices of the Invisible Presence. Diplomatic interpreters in post-World War II Japan.
2009. x, 197 pp.
82 Beeby, Allison, Patricia Rodríguez Inés and Pilar Sánchez-Gijón (eds.): Corpus Use and
Translating. Corpus use for learning to translate and learning corpus use to translate. 2009. x, 151 pp.
81 Milton, John and Paul Bandia (eds.): Agents of Translation. 2009. vi, 337 pp.
80 Hansen, Gyde, Andrew Chesterman and Heidrun Gerzymisch-Arbogast (eds.): Efforts and
Models in Interpreting and Translation Research. A tribute to Daniel Gile. 2009. ix, 302 pp.
79 Yuste Rodrigo, Elia (ed.): Topics in Language Resources for Translation and Localisation. 2008.
xii, 220 pp.
78 Chiaro, Delia, Christine Heiss and Chiara Bucaria (eds.): Between Text and Image. Updating
research in screen translation. 2008. x, 292 pp.
77 Díaz Cintas, Jorge (ed.): The Didactics of Audiovisual Translation. 2008. xii, 263 pp. (incl. CD-Rom).
76 Valero-Garcés, Carmen and Anne Martin (eds.): Crossing Borders in Community Interpreting.
Definitions and dilemmas. 2008. xii, 291 pp.
75 Pym, Anthony, Miriam Shlesinger and Daniel Simeoni (eds.): Beyond Descriptive Translation
Studies. Investigations in homage to Gideon Toury. 2008. xii, 417 pp.
74 Wolf, Michaela and Alexandra Fukari (eds.): Constructing a Sociology of Translation. 2007. vi, 226 pp.
73 Gouadec, Daniel: Translation as a Profession. 2007. xvi, 396 pp.
72 Gambier, Yves, Miriam Shlesinger and Radegundis Stolze (eds.): Doubts and Directions in
Translation Studies. Selected contributions from the EST Congress, Lisbon 2004. 2007. xii, 362 pp. [EST
Subseries 4]
71 St-Pierre, Paul and Prafulla C. Kar (eds.): In Translation – Reflections, Refractions, Transformations.
2007. xvi, 313 pp.
70 Wadensjö, Cecilia, Birgitta Englund Dimitrova and Anna-Lena Nilsson (eds.): The Critical
Link 4. Professionalisation of interpreting in the community. Selected papers from the 4th International
Conference on Interpreting in Legal, Health and Social Service Settings, Stockholm, Sweden, 20-23 May
2004. 2007. x, 314 pp.
69 Delabastita, Dirk, Lieven D’hulst and Reine Meylaerts (eds.): Functional Approaches to
Culture and Translation. Selected papers by José Lambert. 2006. xxviii, 226 pp.
68 Duarte, João Ferreira, Alexandra Assis Rosa and Teresa Seruya (eds.): Translation Studies at the
Interface of Disciplines. 2006. vi, 207 pp.
67 Pym, Anthony, Miriam Shlesinger and Zuzana Jettmarová (eds.): Sociocultural Aspects of
Translating and Interpreting. 2006. viii, 255 pp.
66 Snell-Hornby, Mary: The Turns of Translation Studies. New paradigms or shifting viewpoints? 2006.
xi, 205 pp.
65 Doherty, Monika: Structural Propensities. Translating nominal word groups from English into German.
2006. xxii, 196 pp.
64 Englund Dimitrova, Birgitta: Expertise and Explicitation in the Translation Process. 2005.
xx, 295 pp.
63 Janzen, Terry (ed.): Topics in Signed Language Interpreting. Theory and practice. 2005. xii, 362 pp.
62 Pokorn, Nike K.: Challenging the Traditional Axioms. Translation into a non-mother tongue. 2005.
xii, 166 pp. [EST Subseries 3]
61 Hung, Eva (ed.): Translation and Cultural Change. Studies in history, norms and image-projection. 2005.
xvi, 195 pp.
60 Tennent, Martha (ed.): Training for the New Millennium. Pedagogies for translation and interpreting.
2005. xxvi, 276 pp.
59 Malmkjær, Kirsten (ed.): Translation in Undergraduate Degree Programmes. 2004. vi, 202 pp.
58 Branchadell, Albert and Lovell Margaret West (eds.): Less Translated Languages. 2005. viii, 416 pp.
57 Chernov, Ghelly V.: Inference and Anticipation in Simultaneous Interpreting. A probability-prediction
model. Edited with a critical foreword by Robin Setton and Adelina Hild. 2004. xxx, 268 pp. [EST Subseries
2]
56 Orero, Pilar (ed.): Topics in Audiovisual Translation. 2004. xiv, 227 pp.
55 Angelelli, Claudia V.: Revisiting the Interpreter’s Role. A study of conference, court, and medical
interpreters in Canada, Mexico, and the United States. 2004. xvi, 127 pp.
54 González Davies, Maria: Multiple Voices in the Translation Classroom. Activities, tasks and projects.
2004. x, 262 pp.
53 Diriker, Ebru: De-/Re-Contextualizing Conference Interpreting. Interpreters in the Ivory Tower? 2004.
x, 223 pp.
52 Hale, Sandra: The Discourse of Court Interpreting. Discourse practices of the law, the witness and the
interpreter. 2004. xviii, 267 pp.
51 Chan, Leo Tak-hung: Twentieth-Century Chinese Translation Theory. Modes, issues and debates. 2004.
xvi, 277 pp.
50 Hansen, Gyde, Kirsten Malmkjær and Daniel Gile (eds.): Claims, Changes and Challenges in
Translation Studies. Selected contributions from the EST Congress, Copenhagen 2001. 2004. xiv, 320 pp.
[EST Subseries 1]
49 Pym, Anthony: The Moving Text. Localization, translation, and distribution. 2004. xviii, 223 pp.
48 Mauranen, Anna and Pekka Kujamäki (eds.): Translation Universals. Do they exist? 2004. vi, 224 pp.
47 Sawyer, David B.: Fundamental Aspects of Interpreter Education. Curriculum and Assessment. 2004.
xviii, 312 pp.
46 Brunette, Louise, Georges L. Bastin, Isabelle Hemlin and Heather Clarke (eds.): The Critical
Link 3. Interpreters in the Community. Selected papers from the Third International Conference on
Interpreting in Legal, Health and Social Service Settings, Montréal, Quebec, Canada 22–26 May 2001. 2003.
xii, 359 pp.
45 Alves, Fabio (ed.): Triangulating Translation. Perspectives in process oriented research. 2003. x, 165 pp.
44 Singerman, Robert: Jewish Translation History. A bibliography of bibliographies and studies. With an
introductory essay by Gideon Toury. 2002. xxxvi, 420 pp.
43 Garzone, Giuliana and Maurizio Viezzi (eds.): Interpreting in the 21st Century. Challenges and
opportunities. 2002. x, 337 pp.
42 Hung, Eva (ed.): Teaching Translation and Interpreting 4. Building bridges. 2002. xii, 243 pp.
41 Nida, Eugene A.: Contexts in Translating. 2002. x, 127 pp.
40 Englund Dimitrova, Birgitta and Kenneth Hyltenstam (eds.): Language Processing and
Simultaneous Interpreting. Interdisciplinary perspectives. 2000. xvi, 164 pp.
39 Chesterman, Andrew, Natividad Gallardo San Salvador and Yves Gambier (eds.):
Translation in Context. Selected papers from the EST Congress, Granada 1998. 2000. x, 393 pp.
38 Schäffner, Christina and Beverly Adab (eds.): Developing Translation Competence. 2000.
xvi, 244 pp.
37 Tirkkonen-Condit, Sonja and Riitta Jääskeläinen (eds.): Tapping and Mapping the Processes
of Translation and Interpreting. Outlooks on empirical research. 2000. x, 176 pp.
36 Schmid, Monika S.: Translating the Elusive. Marked word order and subjectivity in English-German
translation. 1999. xii, 174 pp.
35 Somers, Harold (ed.): Computers and Translation. A translator's guide. 2003. xvi, 351 pp.
34 Gambier, Yves and Henrik Gottlieb (eds.): (Multi) Media Translation. Concepts, practices, and
research. 2001. xx, 300 pp.
33 Gile, Daniel, Helle V. Dam, Friedel Dubslaff, Bodil Martinsen and Anne Schjoldager
(eds.): Getting Started in Interpreting Research. Methodological reflections, personal accounts and advice
for beginners. 2001. xiv, 255 pp.
32 Beeby, Allison, Doris Ensinger and Marisa Presas (eds.): Investigating Translation. Selected papers
from the 4th International Congress on Translation, Barcelona, 1998. 2000. xiv, 296 pp.
31 Roberts, Roda P., Silvana E. Carr, Diana Abraham and Aideen Dufour (eds.): The Critical
Link 2: Interpreters in the Community. Selected papers from the Second International Conference on
Interpreting in legal, health and social service settings, Vancouver, BC, Canada, 19–23 May 1998. 2000.
vii, 316 pp.
30 Dollerup, Cay: Tales and Translation. The Grimm Tales from Pan-Germanic narratives to shared
international fairytales. 1999. xiv, 384 pp.
29 Wilss, Wolfram: Translation and Interpreting in the 20th Century. Focus on German. 1999. xiii, 256 pp.
28 Setton, Robin: Simultaneous Interpretation. A cognitive-pragmatic analysis. 1999. xvi, 397 pp.
27 Beylard-Ozeroff, Ann, Jana Králová and Barbara Moser-Mercer (eds.): Translators'
Strategies and Creativity. Selected Papers from the 9th International Conference on Translation and
Interpreting, Prague, September 1995. In honor of Jiří Levý and Anton Popovič. 1998. xiv, 230 pp.
26 Trosborg, Anna (ed.): Text Typology and Translation. 1997. xvi, 342 pp.
25 Pollard, David E. (ed.): Translation and Creation. Readings of Western Literature in Early Modern
China, 1840–1918. 1998. vi, 336 pp.
24 Orero, Pilar and Juan C. Sager (eds.): The Translator's Dialogue. Giovanni Pontiero. 1997. xiv, 252 pp.
23 Gambier, Yves, Daniel Gile and Christopher Taylor (eds.): Conference Interpreting: Current Trends
in Research. Proceedings of the International Conference on Interpreting: What do we know and how? 1997.
iv, 246 pp.
22 Chesterman, Andrew: Memes of Translation. The spread of ideas in translation theory. 1997. vii, 219 pp.
21 Bush, Peter and Kirsten Malmkjær (eds.): Rimbaud's Rainbow. Literary translation in higher
education. 1998. x, 200 pp.
20 Snell-Hornby, Mary, Zuzana Jettmarová and Klaus Kaindl (eds.): Translation as Intercultural
Communication. Selected papers from the EST Congress, Prague 1995. 1997. x, 354 pp.
19 Carr, Silvana E., Roda P. Roberts, Aideen Dufour and Dini Steyn (eds.): The Critical Link:
Interpreters in the Community. Papers from the 1st international conference on interpreting in legal, health
and social service settings, Geneva Park, Canada, 1–4 June 1995. 1997. viii, 322 pp.
18 Somers, Harold (ed.): Terminology, LSP and Translation. Studies in language engineering in honour of
Juan C. Sager. 1996. xii, 250 pp.
17 Poyatos, Fernando (ed.): Nonverbal Communication and Translation. New perspectives and challenges
in literature, interpretation and the media. 1997. xii, 361 pp.
16 Dollerup, Cay and Vibeke Appel (eds.): Teaching Translation and Interpreting 3. New Horizons.
Papers from the Third Language International Conference, Elsinore, Denmark, 1995. 1996. viii, 338 pp.
15 Wilss, Wolfram: Knowledge and Skills in Translator Behavior. 1996. xiii, 259 pp.
14 Melby, Alan K. and Terry Warner: The Possibility of Language. A discussion of the nature of language,
with implications for human and machine translation. 1995. xxvi, 276 pp.
13 Delisle, Jean and Judith Woodsworth (eds.): Translators through History. 1995. xvi, 346 pp.
12 Bergenholtz, Henning and Sven Tarp (eds.): Manual of Specialised Lexicography. The preparation
of specialised dictionaries. 1995. 256 pp.
11 Vinay, Jean-Paul and Jean Darbelnet: Comparative Stylistics of French and English. A methodology
for translation. Translated and edited by Juan C. Sager and M.-J. Hamel. 1995. xx, 359 pp.
10 Kussmaul, Paul: Training the Translator. 1995. x, 178 pp.
9 Rey, Alain: Essays on Terminology. Translated by Juan C. Sager. With an introduction by Bruno de Bessé.
1995. xiv, 223 pp.
8 Gile, Daniel: Basic Concepts and Models for Interpreter and Translator Training. 1995. xvi, 278 pp.
7 Beaugrande, Robert de, Abdullah Shunnaq and Mohamed Helmy Heliel (eds.): Language,
Discourse and Translation in the West and Middle East. 1994. xii, 256 pp.
6 Edwards, Alicia B.: The Practice of Court Interpreting. 1995. xiii, 192 pp.
5 Dollerup, Cay and Annette Lindegaard (eds.): Teaching Translation and Interpreting 2. Insights,
aims and visions. Papers from the Second Language International Conference Elsinore, 1993. 1994.
viii, 358 pp.
4 Toury, Gideon: Descriptive Translation Studies – and beyond. 1995. viii, 312 pp.
3 Lambert, Sylvie and Barbara Moser-Mercer (eds.): Bridging the Gap. Empirical research in
simultaneous interpretation. 1994. 362 pp.
2 Snell-Hornby, Mary, Franz Pöchhacker and Klaus Kaindl (eds.): Translation Studies: An
Interdiscipline. Selected papers from the Translation Studies Congress, Vienna, 1992. 1994. xii, 438 pp.
1 Sager, Juan C.: Language Engineering and Translation. Consequences of automation. 1994. xx, 345 pp.

Beeby Corpus Use and Translating Corpus Use For Learning To Translate and Learning Corpus Use To Translate

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Beeby Corpus Use and Translating Corpus Use For Learning To Translate and Learning Corpus Use To Translate

Uploaded by

Copyright:

Available Formats

Corpus Use and Translating

Benjamins Translation Library (BTL)

General Editor Associate Editor Honorary Editor

John Benjamins Publishing Company

American National Standard for Information Sciences – Permanence of

Library of Congress Cataloging-in-Publication Data

© 2009 – John Benjamins B.V.

List of editors and contributors vii

Using corpora and retrieval software as a source of materials

Safeguarding the lexicogrammatical environment:

Are translations longer than source texts? A corpus-based study

Arriving at equivalence: Making a case for comparable general

Virtual corpora as documentation resources: Translating travel

Developing documentation skills to build do-it-yourself corpora

Subject index 151

Allison Beeby Patricia Rodríguez Inés

Ana Frankenberg-Garcia Gill Philip

yet become widely established among professional translators. Regardless of its

Allison Beeby, Patricia Rodríguez Inés and Pilar Sánchez-Gijón

Corpus Use and Translating is mainly addressed to those interested in translation

­ urthermore, other authors in previous CULT publications have published exten-

Within a task-based methodological framework, the authors present four

Josep Marco and Heike van Lawick

Key words: Translator training, corpora, task-based approach, corpus-based,

1. The role of corpus-related resources in translator training

According to experts in second language acquisition (Partington 1998: 5–7; Aston

The same distinction applies to corpus-related resources when used in a transla-

potential of learner corpora, which in a translator training environment would be

2. Pedagogic assumptions: Objectives and methodology

negotiation of different text types and, especially, genres.

and ­Pujol 2003) devotes a long chapter to “elements of cross-linguistic contrast”.

odology is rooted in the communicative approach to second language learning,

3. Designing corpus-related translation tasks

Translator trainers can draw on various corpora in order to elaborate translation

3.1 Cloze tests based on a bilingual corpus

3. On this point, see also Kelly (2005: 16–17).

The cognitive approach to polysemous phenomena enables us to see this kind

3.2 Multiple choice exercises based on a learner corpus

3.3 Translation of short passages yielded by the concordancer

3.4 Concordance analysis

As to the second aspect mentioned above – norms governing translation decisions

Hurtado, A. (dir.). 1999. Enseñar a traducir. Metodología en la formación de traductores e inté-

Appendix 1. Tasks with als and wenn for the German-Catalan

1. Kleine Strohhütten dienen den Tieren als Unterkunft.

2. Fill in the gaps with an adequate translation into Catalan

Im Februar 1918 verlor Baranowicz den Al febrer de 1918, Baranowicz, __________

Appendix 2. A multiple choice task based on a corpus

Appendix 3. Tasks with now for the English-Catalan translation classroom

1. Identify as many meanings of now as you can in the following excerpts:

2. Translate the passages in the previous task.

Appendix 4. References to text in COVALT corpus

Conrad, J. 1990. Typhoon, and Other Stories. London: Penguin.

Key words: Translation teaching, semantic prosody, intuition, empirical data,

These days, translation theorists advise us to go holistic. Translators should be

1. Studies on semantic prosody

1.1 Studies on semantic prosody in corpus linguistics

follows, in this case something undesirable (co-occurrences of symptomatic of in

1.2 Studies on semantic prosody in translation

Tognini-Bonelli (2001: 113–128, 2002) uses corpus data to compare semantic

2. The teaching module: Methodology and results

i. The air of the room chilled his shoulders

I selected this sentence because it seemed to me to represent a switch of mood

This produced a more manageable selection of 46 occurrences, though it included

List of editors and contributors vii

Subject index 151

urthermore, other authors in previous CULT publications have published exten-

and Pujol 2003) devotes a long chapter to “elements of cross-linguistic contrast”.

r eflexivity can be a determining feature in arriving at translation equivalence

present a systematic methodology for the creation of a virtual corpus of travel