There are, however, other useful applications of corpus linguistics for working translators. One interesting approach is described by Maher, Waller and Kerans in the July 2008 issue of the Journal of Specialised Translation in an article titled "Acquiring or enhancing a translation specialism: the monolingual corpus-guided approach".
The article discusses absolutely practical ways in which working translators can use concordancers and desktop-based indexers to acquire or enhance linguistic expertise for special subjects in their target languages. The target readers for the article are
- novice translators seeking to specialize
- experienced generalists who want to go "up market" with a specialty
- translators who wish to enhance their subject-area expertise for a special client
- translators working in a team who need to harmonize their use of language
After reading the article, I downloaded the tools and tested them. I was very impressed. Archivarius is much better than Copernic, which I have used for some time - a key difference is that it can deal with morphology in 18 languages. I personally only care about two of these, but it ought to make many translators happy. A 30-day fully functional trial with 99 launches is available, and individual licences range between about 20 euros for students and 45 euros for businesses. (Maybe a freelancer qualifies for the 30 euro "personal" license - that wasn't clear to me when I looked at the web site. I'll find out, however, because I will license this tool!) Dealing with morphology means, for example, that I can search "gleich" and get "gleiche", "gleicher", "gleichen" and "gleiches" in German.
The article has a nice discussion of access to free, readily-available texts. I also discovered in my research that there are large corpora covering specialist domains available for free in some languages. The American National Corpus is one example - I found a Berlitz travel corpus there with over a million words. Not my interest, but for someone who specializes in tourism or wants to, this is probably useful. The authors put together a special corpus for corporate financial reports using publicly available documents, and other examples were given.
The discussion of sampling adequacy is very valuable in my opinion. This is a question which has nagged me for a long time; the several books on the subject of corpus linguistics which are in my library dance around this issue and never commit to hard numbers that I can do something with. I am grateful to the authors for sticking their necks out and saying, for example, that while 40,000 words might be an adequate basis for a language teacher wanting to get started in a specialist area, a translator's linguistic questions probably won't be usefully addressed with less than about 250,000 words, with 500,000 being the point where things really start to get good.
The authors use a practical model of a high-quality base or "substrate corpus", which is carefully selected, cleaned of reference lists, non-linguistic content, extra spaces (these screw up frequency counts for phrases not to mention their identification) and maintained plus Q&D (quick-and-dirty) corpora, which cover specific topics for a current job, etc. Q&D corpora of a million words or more can be assembled in minutes using online corpus collectors, such as the Sketch Engine. The article gives good, practical advice on blancing these two types of resources and how they can and should be stored on your hard drive.
The discussion of "fair use" is thoughtful. I agree with it, but others, including some lawyers, may not. These topics get debated in public forums a lot, and have been the subject of articles in professional journals as well where intellectual property issues regarding translations and translation memories are raised. For those with an interest in such topics, there is enough out there to keep you busy reading for months. I tend to be cautious and share resources only when I am sure no legal objections will be raised.
The authors offer practical advice or storing and organizing corpora, including the importance of naming conventions for files and maintaining a log of corpora. This advice should be read carefully, as it reflects some hard-won experience.
In their conclusions, the authors emphasize that this appoach not only has value for compensating uneven or insufficient knowledge of a field, genre or register, but it can also be important for counteracting source language interference for people like me who live in the country where the source language is spoken and do not have daily contact with speakers and the culture of the target language. That's a valuable point, because many of us have observed such problems with ourselves or others. (If you haven't, you're either a hermit or just incredibly dense.)
Given how often the question of specialization and how to acquire it is raised on forums like Translator's Café or ProZ, I think the article can help an enormous number of translators improve their situation. I particularly appreciated the clear language of the article and its example-based, practical advice. Many people, especially those coming to translation from other career or edicational backgrounds than languages or linguistics, who try to investigate the topic of corpus linguistics get snowed under to quickly in a blizzard of academic bullshit. This is an article that you can read in an hour and apply in a useful way in the next hour.