Pages

Apr 26, 2012

Twitterview: SDL Trados Studio, memoQ, DVX2 and PDF extraction

When I began using Twitter somewhat hesitantly three years ago, I never expected that it would eventually prove to be one of the most useful social media tools for gathering information of professional value. Much of this is serendipitous; I really never know what will come floating down the twitstream or where some of the conversations in it will go. Like the direct chat I had with with a colleague in New Zealand about features she liked best in the two main CAT tools she uses, SDL Trados Studio and memoQ.

We both really appreciate the TM-driven segmentation in memoQ and the superior leverage this offers. But to my surprise, she expressed a preference for SDL Trados Studio, particularly for the quality of its PDF text extractions from electronically generated files. This is not a feature I make heavy use of in either tool, though I have used it more often lately in memoQ for alignments in the LiveDocs module and found it generally satisfactory. Most of my work involving PDF files is with scanned documents - there one has no choice but to use a good OCR tool like OmniPage or ABBYY FineReader.

So I was quite intrigued that the quality of PDF was "better" than from standalone tools. Especially because my experience is quite different. Further discussion (not shown in the graphic) revealed that what she actually meant was that the quality of the text extraction with the CAT tool usually beat the quality of text received from translation agencies who performed conversions. That is easy to explain, really. In my experience, most agencies are clueless about how to use conversion tools and too often use automated settings and save the results "with layout". This is very often utterly unsuited for work with translation environment tools or requires a lot of cleanup and code zapping.

For years I have recommended to agencies and colleagues that they spare themselves a lot of headaches by saving PDF conversions as plain text and adding any desired formatting later. Most people ignore that advice and suffer accordingly. So in a way, a CAT tool that does so encourages "best practice" for PDF translation for those files they are actually able to handle.

Encouraged by the Twitter exchange, I decided to do a few tests with files from recent projects. I took a PDF I had with various IFRS-related texts from EU publications. It appeared to extract quickly and cleanly in memoQ, giving me a translation grid full of nicely segmented text. SDL Trados Studio 2009 choked badly on it and extracted nothing. Her extraction in SDL Trados Studio 2011 caused a timeout with the project I was told, but the text itself was completely extracted and converted to DOCX format. This is useful, because unlike the extraction to plain text in memoQ, this offers the possibility to add or change some text formatting in the translation grid. Other extraction examples from SDL Trados Studio 2011 showed that text formatting was preserved.

A closer examination of the extracted texts revealed some problems with both the memoQ and Trados Studio extractions. The memoQ 5 PDF text extraction engine proved incapable of handling text in multiple columns properly. The paragraph order was all fouled up. The extraction with SDL Trados Studio had a great number of superfluous spaces. Whether it is possible to optimize this in the settings somehow I do not know. The results of all the extraction tests are downloadable here in a 6 MB ZIP file. I've included the SDL Trados Studio extraction saved to plain text as well for a better comparison of the text order and surplus spaces problems.

Overall, I am personally not very pleased with the results of the text extractions from PDF in either tool. The results from SDL Trados Studio are clearly better, and other examples that were shared made it clear that this tool works better than many an untrained PM with better PDF conversion software. This is certainly much better than solutions I see many translators using. But really, nothing beats good OCR software, an understanding of how to use it well and a proper workflow to get a good TM and target file better fit for most purposes.

*****

Update 2012-05-22: I met colleague Victor Dewsbery at a recent gathering in Berlin, and he told me about his tests with the recently introduced PDF import feature of Atril's Déjà Vu X2 translation environment. He kindly offered to share his results (available for download here) and wrote:

Here is the result of the PDF>DVX2>RTF>ZIP process for your monster EU PDF file. Comments on the process and the result:
  • The steps involved were: 1. import the file into DVX2 as a PDF file; 2. mark all segments and copy source to target; 3. export the file as if it were a translated file (it comes out as an RTF file). The RTF file is 20 MB in size and zips to 3 MB.
  • Steps 1 and 3 took a long time, and DVX2 claimed to be not responding. For step 1 I just left it and it eventually came up with the goods. Step 3 exported the RTF file perfectly, even though DVX2 claimed that the export had not finished. I was able to open the RTF file (it was locked, but I simply renamed it), and this is the version which I enclose. Half an hour later DVX2 had still not ended the export process (and had to be closed via the Task Manager), although the exported file was in fact perfectly OK. The procedure worked more smoothly with a couple of smaller PDF files. Atril is working on streamlining the process and ironing out the glitches in the process, especially the “not responding” messages.
  • The result actually looks very good to me. There are hardly any codes in the DVX2 project file (the import routine also integrates CodeZapper). I didn’t spot any mistakes in the sequence of the text. Indented sections with numbering seem to be formatted properly - i.e. with tabs and without any multiple spaces.
  • The top and bottom page boundaries in the exported file are too wide, so most pages run over and the document has over 900 pages instead of just under 500. Marking the whole document and dragging the header/footer spaces in Word seems to fix this fairly quickly.
  • I note that some headlines are made up of individual letters with spaces between them. This may be related to the German habit of using letter spacing (“Sperrschrift”) for emphasis as an alternative to bold type.
  • I found one instance where text was chopped up into a table on page 857 of the file.
  • There are occasional arbitrary jumps in type size and right/left page boundaries between sections.
On the strength of this sample, it would usually be OK to simply import the PDF file into DVX2, translate in the normal way, and then fix any formatting problems in the exported file.



Apr 23, 2012

Five favorite things about the CAT

When I listen to colleagues talk about the tools they use, I find it interesting how diverse the points are that they emphasize when describing the advantages of their environments. I'm not surprised, really, because the needs of individuals vary a lot as do the projects they may encounter in different phases of their careers. And the limits of a tool itself influence the "advantages", of course.

So I sent off a few e-mails to some friends, asking what they felt the "top five" features of their choice of CAT tools are for their purposes. I'm still waiting for a few responses, but I'd like to share what I've heard so far. I'll add more as and when I get more feedback. Others are welcome to add their Top 5s in the comments.

OmegaT
The respondent here uses this tool exclusively and does a fine job of correcting me every time I put my foot in my mouth with a misstatement about the capabilities of the software. He wrote with his typical humility:
I don't think I'm the right person to do this. Mainly because OmegaT is the only tool I've used for the last nine years.

Favourite features:
  • It runs on Linux.
  • It does what it says on the box.
  • It *always* works. It doesn't hang or crash. I can't remember when it last did, if it ever did. Bugs are rare and when they do happen tend to concern secondary features that are still at the beta stage.
  • It's fast; almost everything is near-instantaneous. (An exception is the "Search files" function, though I don't use that much and I doubt many others do, either.)
  • "Upgrading" means downloading and unpacking, that's it.
Not very sexy, sorry. Like my '99 Mazda 626 wagon, it just gets the job done
Just gets the job done? Well, what more could one want? There are plenty of tools that often don't manage that. I've seen huge improvements in this environment in recent years, and there are certainly worse tools to start with. It is Open Source but worth taking seriously for professional work.

SDL Trados Studio
The respondent here is still using Studio 2009, so surely more notable features will be discovered after an upgrade to Studio 2011. But for his team of top-notch translators, the three he indicated are pretty persuasive:
  • AutoSuggest (which is apparently so good that Kilgray will implement it in the next memoQ version, a rare exception to emulation that more often runs the other way)
  • File format filters. (I have often used SDL filters to prepare content to translate in other environments)
  • Project package sharing. (This is useful in any environment which offers it, but there is a need for all vendors to get off their butts and support interoperable package standards.)
Déjà Vu
See Victor Dewsbery's comment below. DVX used to be my favorite tool, one whose innovations have still to be matched in some respects by any of the competition. Its visionary software architect is arguably one the greatest contributors ever to the development of user friendly CAT tools.I am not personally familiar with the current version of the software at the present time nor with the server solution which was finally released.

memoQ
I spend far too much time on my blog talking about my fave features of this tool, so I asked a few others to have their say. One said that the things she found most helpful were
  • PDF alignment (although it doesn't always work)
  • The infinitely customizable interface - fonts, colors, sizes of windows, placement of windows, horizontal/vertical split, etc.
  • Extensive [bilingual] export options (memoq bilingual, xliff, trados, two-column rtf)
  • Lean software - version 5.0.62 is under 26 MB download, even smaller than version 3.0.37 which was nearly 35. Compare this to the bloat of SDL - currently 338 MB if my research is correct.
  • "Duh" comments: very responsive support, frequent new releases and features, inexpensive

 *******

Do you use one of these tools or another and want to share the features which help you most to be productive or which just put a big smile on your face? Have your say in the comments.

.


Apr 22, 2012

Spamalot: desperate translation agencies


Soon after I started this blog in 2008 I found it necessary to take various measures to protect against spam comments. In the early days, the spam was a lot of the same that somehow makes its way to my e-mail inbox, usually real estate scams, offers of wondrous herbal mixtures to supersize various anatomical parts or God-knows-what in Chinese. Spam filters have become sophisticated enough to catch most of that now, so that the trash encountered most of the time now appears to have a bit more human intelligence behind it. That is, if you associate bottom-feeding translation agencies with human intelligence in some misguided way.

I have a definite impression that the competition in the race to the bottom for low prices and garbage quality has led some of these agencies to think they can improve their position in the global dog race by getting backlinks on popular translation blogs. This is done in comments on blog posts, sometimes as a link embedded in the rather vacuous text (as Rosetta once did to a New York agency rival's blog below) or in the link associated with the poster's name (as "Cassy" - surely not its real name - tried above on behalf of "LanguageTran"). What started out as the practice of a few sleazy Pakistani agencies years ago seems to have become routine for a number of UK and US bottom-feeders.


I've seen this happen on almost every translation blog I read. There are so many attempts made on my blog that I make it a point to check the name link on every comment that seems a bit "vacant", and a very large number get flushed to where they belong.

I'm not against comments from agency owners or personnel - quite the contrary. These people are a legitimate part of our business and have a lot to contribute to discussions - and many do. I love it when they or others have something interesting to say in a post discussion, particularly if they can point out an error on my part or add some useful point. But if the point of the visit is simply to post drivel in the hopes of getting a link to drive traffic to your agency's site or improve its search engine ranking, y'all can save yourselves the trouble because you will most likely not be getting through.

Apr 21, 2012

TM Follies


A recent comment by Iwan Davies on Twitter revealing a reviewer's rather odd notions of the requirements imposed on translation by the use of a translation memory tool led me to reflect with a friend on some of the very strange and wrong ideas that persist in some minds with regard to such technology. In the case of the twitstream discussion, the reviewer's stupid notion that each segment in a translation must stand on its own without context provoked an interesting flurry of responses, ranging from the astute observation from @PaulAppleyard that "if you wanted to translate segments as 'standalone', then you would work in a random segment file, not a text that flows..." to some rather disturbing remarks from a few to the effect of "this is why I don't like to use such tools". Various people pointed out that modern translation environment tools such as SDL Trados Studio, OmegaT and memoQ make use of context in their translation memories to avoid the problems of more primitive systems which, in the hands of translating monkeys, too often result in matches being used in very inappropriate ways.

The list I could compile of wrong-headed ideas about TMs is a long one, and I would probably only capture ten percent of the foolishness on a lucky day. A few highlights in my memory include:
  • A statement by an otherwise respected colleague some years ago that translators must not sacrifice potential "leverage" by combining segments to make sense in the translation. This included cases where someone inserts
    carriage
    returns and line
    breaks into the sentence to
    make it fit in some odd space. In a source language like German, where word order is often very different than in a good English translation, this can quickly pollute a TM to the point of being worst than worthless. This in fact describes the real state of many "promiscuous" agency TMs that I have seen over the years. Fortunately, advanced features in modern translation memories, like memoQ's "TM-driven segmentation" encourage much better practice among smart service providers today.
  • The widespread notion that translation memory systems are only useful if one works on repetitive texts. I've got news for you: much of the repetitive stuff was outsourced to King Louie & Co. years ago. And yet I still find great value in working with good TMs. Why? A friend of mine summarized it nicely the other day when she talked about how she spent two hours researching a very obscure term for roadworks equipment in a minor European language: "The next time this comes up, I can find it right away and see the context." Indeed. I am amazed sometimes at the obscure technical terminology that comes out of my personal TM with its 12 year record of my work. Sometimes that amazement is even positive. An hour invested in researching a term and saving it in a TM (or much better: a proper termbase with metadata including domain´, source and examples of use) is probably several more hours saved over the next few years. At least.
  • The idea that a translation memory is a reliable source of terminology and obviates the need to create and maintain termbases or proper glossaries. Wrong, wrong, wrong. Particular offenders in this regard are agencies with their brothel-like practices of letting any number of translators screw the end customers' texts. Do a concordance search to find the right term in one of those TMs? Riiiiiiiiiight. Even agencies I've worked with for years who have made a real effort to keep TMs clean can't keep the terms in them on the straight and narrow. And using TMs to replace a real termbase, even a limited one, sacrifices the enormous potential benefits of automated terminology QA procedures offered by some modern translation environments.
  • King Louie & Co. as well as many other agencies in the race to the bottom of the quality barrel truly believe that once a good TM has been established by top translators, the second- or third-tier team can take over at lower cost and keep the customer happy. Well... at the moment, the lock on my Volvo's rear hatch is broken. I could get it fixed by a mechanic on Monday, or I could follow my neighbor's suggestion, and just hold it shut with a bungee cord. And the next time a tail light cover gets broken, I could just tape some red or yellow plastic film over it. Replace the hubcap that flew off when I hit that pothole? Naw. But sooner or later, people will notice the difference and draw their own conclusions. Will those be good for business? Can a jobbing student equipped with a good TM really produce the quality of legal translation you can rely on before the court? Trägt er auch 'nen gold'nen Ring, der Affe bleibt....
I have noted over the years, that most of the best clients never ask about translation memories or the tools related to them, even though a good number of them are aware of the technology and many of these use it. But these are the ones who understand that the monkeys who rely slavishly on CAT tools without the use of BAT* too often produce stale, stilted text unsuited for its communication purpose. At best. And all the king's machine translation engines won't change that.

Nonetheless, I believe there is great value for nearly all translators, even "creatives", in using advanced translation environment tools. But that value will not be in the same methods nor in the same features necessarily. Calls to "throw away your TMs" with the introduction of advanced alignment technologies like Kilgray's LiveDocs in memoQ, which allow final edited versions of past documents to be incorporated quickly when a new versions are to be translated, may be a bit premature, but they are often appropriate in my recent experience. And combinations of that with voice recognition technologies, term QA tools and other features offer a wealth of creative possibilities for taking the best and leaving the rest in our quest for better results and working conditions.


* brain-assisted translation

Apr 19, 2012

Kilgray survey until April 25th: ready, aim, respond!

Kilgray is conducting a survey until April 25th, the results of which will be discussed at the conference in May. This is an opportunity for users of memoQ and their other products to say what's working and what's not and perhaps influence the future course.

It's a short survey and should take little time to answer. The first question asks three things they are doing right, the second one asks which three things are wrong and need to stop, and further questions ask your opinion on the importance of maintaining independence (in contrast to SDL Trados, Star Transit or Wordfast, for example, which are owned by language service resellers which compete with agency customers and freelancers for business) and what "big idea" you might like to see implemented.

On several occasions in past years, I have seen Kilgray respond quickly to clearly expressed user feedback, and I believe that this is still a reasonable expectation. So have your say to help make some good tools better!

Apr 17, 2012

April-Übersetzertreffen in Lehnitz


Liebe Kolleginnen und Kollegen,

das nächste Übersetzertreffen steht an, und zwar am:

                Donnerstag, 19. April 2012, ab 19.00 Uhr

Wir gehen nach der Pause im letzten Monat noch einmal ins:

                Restaurant Kellari
                Gutsplatz 1
                16515 Lehnitz/Oranienburg
                S-Bahn S1: Lehnitz

Bis Donnerstag!
Andreas Linke


Vorschau:
Das übernächste Übersetzertreffen findet wie üblich am dritten Donnerstag des Monats statt, nämlich am 17. Mai 2012.


Final checks in memoQ

"Having to do a separate final check in Word is a major MemoQ disadvantage over the Word/Trados Workbench (and Wordfast Classic) WYSIWYG procedure. It might even make some of us abandon MemoQ."
I read that statement in a recent digest from the Yahoogroups memoQ forum with some puzzlement. What exactly does the author of those words mean?


There are a few arguments I can muster in favor of the necessity to do a final check in MS Word or another original format. The limitations of the spellchecker in memoQ is one of these. Even when the MS Word spellchecker is used, as I recall memoQ (in the versions where I noticed this problem) did not flag doubled words, and I have a bad habit of typing "and and" and the like.


The use of style guide and consistency-checking tools like PerfectIt! are other good reasons to do such external checks.


But when I do such things, I work on my second monitor and immediately incorporate changes in my memoQ project to keep the TM updated among other things. Also, the filters in memoQ enable me to examine the scope of some problems faster and with greater ease than multiple "Find" operations in a word processor or other software.


But if the person quoted meant simple ease of reading on the screen, I wonder if he has paid any attention to the optimal use of the memoQ translation preview. One could simply resize that pane after translation and read through a preview of the translation:


If a problem is found, clicking on the text will select it and cause the translation window (above the preview) to jump to the segment to be corrected. And of course this works for any format that yields a preview in memoQ, so you are not limited as you would be working with the Trados Workbench macros or Wordfast Classic in MS Word. Excel files, PowerPoint slides, HTML, ODT files and other formats enable you to work this way.


But another reason why I would hesitate strongly before regressing to the tools mentioned is that I would sacrifice the ability to do terminology checks with the QA module. (This is, of course, possible to a limited extent in TagEditor.) Or other QA checks which may be of interest. These features are severely underutilized, but they aren't hard to learn, and they offer considerable benefits to freelance translators in the competition for consistent formal quality.


Similar advantages are likely to be had from other recent versions of leading translation environment tools. Very often it pays to consider the points of difficulty we have with these and discuss them with other users, because often new and better ways of using them will come to light.

Apr 16, 2012

Another approach to OCR for translation

OCR is often a touchy subject for translators. There is unfortunately too little expertise in this area, though the practice of converting scanned text for translation is now quite common. And recent developments in tools such as ABBYY FineReader have catered to the worst of the idiocy I have seen, starting processes automatically which are best executed manually with greater care.

Too many people rely on automatic settings for OCR conversion and save the result in a format (usually an MS Word or RTF file) which more or less preserves the look of the original. The result may look pretty to the ignorant eye, but when the translator begins work, a host of problems may arise. In CAT tools, there are usually innumerable superfluous tags (which sometimes even CodeZapper cannot clean up), even embedded in the middle of words, which prevents matching with TM and glossary entries. Kiss consistency and quality control features goodbye in such cases. In older versions of Trados, font changes and other format trashing are common. Disappearing text in ill-defined text boxes and column is often a problem, even for those who do not use translation environment tools.

For these reasons and others, I have long been an advocate of manual zone definition and (where necessary) the use of templates to achieve the best conversion results, and I usually save the results as naked text or, at most, preserving some font formatting (in which case further adjustments by mass selection are usually necessary to ensure that body text is, for example, consistently 10 point and not 9 point or 10.5 point in some spots due to image distortions).

If the result of a translation will be given to a graphic artist for subsequent layout, you do in fact perform a good deed by avoiding the "save with layout" options for your OCR text. A document with a straight text flow is much easier to import into the layout environment (such as InDesign). Of course, where such things are to be done, it is often best to try to get the content in that environment's format in the first place, though in such cases, certain clean-up (of hyphenation, kerning and column breaks, for example) is necessary to avoid problems and tag checks are essential after translation.

However, when you work with OCR texts, no matter how they are created for translation, some errors are almost inevitable. When the OCR engine has a spellchecker and "intelligent" correction features, some of these errors may even be plausible, just wrong, so beware! It is vital to have a copy of the original scanned document as a printout or perhaps on a second working screen for reference. I have followed this approach for years. But when I have a good conversion, I might translate happily for a number of pages before encountering a whiskey tango foxtrot moment in which I must consult the original to see what the text really says. This is a real problem for me in a multi-column patent with small type, and I often spent quite a bit of time looking in the scanned PDF for the relevant passage.

That was the case at least until one day, the light went on, and I realized that making a searchable PDF from the original scan could enable me to find the relevant text faster. If this searchable PDF is made from the same OCR process used to create the text to translate, then any errors will be the same, and by putting in the questionable text, you can go precisely to the right place in the document! My original post on this subject on another blog was primarily about making searchable PDFs for reference documents (to find terms and usage easily in documents not intended for translation), but I actually use this error-finding technique more often in my work lately.



Apr 13, 2012

memoQuickie: character controls

memoQ has a few useful, though somewhat badly organized, functions in the Edit menu and on the toolbar which provide support for special characters or the visibility of non-printing character (to clean up extra spaces or verify the presence of a non-breaking space, for example).

To toggle the display of non-printing characters (spaces, etc.), click the corresponding icon in the toolbar:

Special characters are a bit disjointed. The omega icon on the toolbar offers a few useful options:


However, if you need other characters, such as mathematical operators, copyright or registered trademark symbols, etc., you must go to Edit > Insert Symbol...


This opens the familiar Windows dialog:


Personally, I think this should be re-organized with that dialog also accessible under the toolbar's omega icon. I like the selection offered there, but adopting the "see more" approach of Microsoft Word would be helpful:





Apr 11, 2012

Notifications in OTM

For more than two years now, I have used the SaaS project management system from LSP.net (the Online Translation Manager or OTM) to handle my project tracking, quotation, delivery, archiving and invoicing workflows. Since early 2010 when the system was first made available to others beyond the group it had served for nearly a decade previously, it has developed rapidly into a unique, full-featured system for translation agency operations. For an individual translator like myself, it is arguably overkill, but its secure delivery features and excellent 24/7 client access to data via private archives as well as the very modest cost and the fact that I have no infrastructure to maintain are the decisive factors for me. I haven't seen anything else I can afford which takes my business as seriously as I do. Or more so, to tell the truth :-) As a matter of full disclosure, I will remind readers of what many already know: I do the English localization for the product, and I have a formal employment relationship with another member of LSP.net's corporate group. But all that is the consequence of liking and using the product and wanting to see it develop further, not the other way around.

The project mail system in OTM is excellent in most respects, but limited. As mentioned previously, it uses text only for reasons of data security to prevent the transmission of viruses via graphics of JavaScript in HTML mail. Some potential users have not adopted the solution for this reason, but I usually prefer straight text in my e-mail anyway unless I am doing a quick tutorial with screenshots, so I don't care. And it's nice to know that I can't accidentally infect my clients with an "innocent" e-mail. Another "limitation" of the system is that it is web-based, and for quite some time I was vaguely annoyed at having to log in to the system or refresh the Task Board view to see if new mail had arrived. It is not possible to query the mail system via POP or IMAP configurations.

As is so often the case, when we are annoyed by such shortcomings, the solution is usually there for those of us who will RTFM or simply ask the support personnel. One day when I was chatting with the system architect about new features for translation certificates being added, I mentioned my annoyance, and he kindly reminded me of the mail forwarding setting in the user profiles of the administration module. In just a few minutes, the usability of the system increased drastically for me as every incoming e-mail was forwarded to an account I monitor on my laptop and on my smartphone, so I no longer missed interesting inquiries or urgent requests because I had no time to log in to the web interface.

The relevant portion of the user profile is shown below. The critical setting is marked with a red arrow:



Both project mail that has been sorted using the project number and "orphaned" e-mail (no identifiable number present for assignment) are forwarded to the external e-mail address.

Apr 3, 2012

Source text versions in memoQ

This feature of memoQ is slightly controversial at the moment, because the scope of functions available in the Translator Pro and PM versions differ.

The idea behind source file versioning in memoQ 5 is to save time and avoid possible problems of pretranslating from TM(s) which may spit out the wrong material. To use source versioning in memoQ, you must specify it on the first page of the Project Wizard:

In the Project Manager edition of memoQ, the source document versions will be shown; currently that is not the case in the Translator Pro edition, but the various versions of the source text that are imported are tracked in the project just the same:

When a new version of the document is received, it is brought into the project using either the Reimport or the Reimport as... command in the Translations view.


Clicking No in the dialog enables you to browse for the new version, which may have a different name or even a different file format. Here I replaced a DOCX with an ODT file. I could just as well have imported a PowerPoint file for content previously translated in an RTF file.

To use a previous major version as the translation reference for the new file, select Operations > X-translate... (instead of Pre-Translate..., which uses the translation memory), selecting the major version of the source file you want to use and the relevant options. (Each time you reimport the source file you create a "major version". A "minor version" is created when a version is imported, each time a target file is exported for that source file version, and when the version is "finalized" before another source file reimport is carried out.)


Here's the result:

Easy to filter out the old material and translate the new stuff. If you do a pretranslation (Operations > Pre-Translate...), the TM will supply you with other matches which were not exactly corresponding to the previous major version selected.

In the Project Manager version of memoQ, it is possible to select various versions and output comparison views of the differences:

Note the fuzzy matches which will come from a second step of pre-translation
I find the HTML comparison table for major versions very useful, and I think it would be helpful for legal translators and financial specialists who are sometimes bombarded with dozens of version changes on tight deadlines. Thus it would be a good thing for Kilgray to reconsider this history feature for the Translator Pro version as well.




Übersetzertreffen in Berlin-Kreuzberg (5. April)


Liebe Leute,

hier die Einladung zum nächsten Übersetzertreffen am:

                Donnerstag, 5. April 2012, ab 20.00 Uhr

Dieses Mal gehen wir in eine alte Kreuzberger Institution:

                Max und Moritz
                Oranienstraße 162
                10969 Berlin-Kreuzberg
                U-Bahn: Moritzplatz
                www.maxundmoritzberlin.de

Das Berliner Wirtshaus feiert in diesem Jahr seinen 110. Geburtstag und wir sind dabei. Angeboten wird Küche aus deutschen Landen, darunter auch Vegetarisches.

Unser Tisch soll in der ersten Etage sein (Bibliothek, Treppe hoch, rechts, rechts).


Bis Donnerstag!
Andreas Linke


Apr 2, 2012

Just the facts: real rates and earnings of real translators

For several years now, I have published excerpts on quantity and hourly rates from the rate surveys conducted by the German translators and interpreters association BDÜ. These data are quite extensive and available for purchase at a nominal fee. I have found them quite helpful in understanding the relationship of my rates to the qualified market in the German-speaking region of Europe.

Colleagues and clients in other parts of the world have some problems with these data. First of all, they are published in German only, and other than my small translated excerpts I do not believe that they are easily accessible to those without competence in that language. This is too bad, because the publication also contains data for the translation of some languages beside German to and from English, since German clients sometimes use English as the "gateway" language for large, multilingual projects. But another problem that is claimed with regard to the German data is that they are alleged not reflect the market in other western countries. Anecdotal evidence from qualified translators I know who live elsewhere often does not support this assertion, but it is heard often enough from members of the poverty cult and erstwhile bottomfeeding resellers of language services or those without the competence or confidence to package their services at sustainable rates.

Thus I welcomed the news that the Chartered Institute of Linguists (CIoL) and Institute of Translation and Interpreting (ITI) in the UK had published a new survey of rates and salaries for translators and interpreters. This 56 page document offers an excellent overview of the current demographics and economics of the markets in which members of those organizations are active. It also includes valuable information on business practices regarding job cancellation, rush work and more.

Altogether 1750 translators and interpreters responded to the survey conducted in 2011, providing an excellent statistical basis for the report. Over 80% of all respondents (male and female) were freelancers, with an average age of 46 years. The age distribution is normal, so the median age (not reported) is around the average. Fifty-four percent reported English as their native tongue or language of habitual use (I like that term!). Seventy-three percent of respondents were UK residents with the rest distributed in Continental Europe and the rest of the world. The median years of experience for freelance and salaried full-time translators was the same (13 years), with slight differences for interpreters (10 and 7 years respectively). One percent of respondents listed "no qualifications", the rest having some sort of relevant degree or certificate/exams.

The median gross income for translation and interpreting, a shockingly low GBP 22,000, was distributed fairly evenly over the range from less than GBP 5,000 per year (pin money for part-timing spouses) to GBP 75,000, with a sharp drop above that and only about 1% reporting a six-figure income. The names of these individuals are a closely guarded secret, and there is probably no truth to rumors that members of the Occupy movement will be setting up camp in their front gardens shortly. A strong majority (80%) reported incomes to be the same as last year or higher, contradicting general mutterings of "plunging rates". Apparently there is indeed a demand for quality despite the reluctance of some talking heads in the MT subsector to admit that such a concept applies to language services.

Over 80% of respondents work for translation brokers, with just under 50% setting their own rates and about 70% arriving at rates by negotiation. So much for the "powerlessness" that is so often cited by the poverty cult when discussing compensation for services.

The report contains a further wealth of data on business practices, CAT tool usage and discounts, voice recognition technology, output and specialties as well as a wealth of information on salaried positions, which might be very valuable for assessing one's own status or reasonable terms for working with others.

The section with translation data for the various language pairs is not as granular as the BDÜ report, which has five client categories. Here the data are simply divided into "direct" and "agency" clients. Comparing the data to those with which I am familiar from the BDÜ survey I note that at the present rates of exchange between the euro and the British pound, the figures are lower than I can usually accept, but they are not nearly as grim as many claim. If the pound rises again, some of these rates could look quite reasonable to those living elsewhere. In interpreting the data for planning in other markets, I might use a different theoretical rate for GBP as I now do for the Australian dollar, which has risen sharply in recent times.

In each category. the highest, lowest and most frequent rates are reported along with the number of respondents, the maximum and the median.

Members of the ITI and CIoL received confidential copies of this report a few weeks ago, and private translators forums have been abuzz with the discussion of its implications. There is indeed much to think about, and the modest price asked by the organizations for a copy is well worth the investment. Non-members can purchase the report for GBP 20.00 by telephoning the ITI office at +44 1908 325250 - payment can be made by debit or credit card over the phone. (Please note that credit cards and non-UK debit cards are subject to a 5% processing fee.)






TM-driven segmentation in memoQ

One old feature of memoQ which continues to put cash in my pocket and make my work go faster is TM-driven segmentation. It is a pretranslation option. In theory, it combines and splits segments to improve matches from the TM; in reality it is biased toward combination, which is a good thing, as it emphasizes coherent text chunks.

I recently completed a translation for my least favorite end client of an agency partner I rather like. I suppose the folks at this end client company are nice enough; most probably do not beat their dogs or their children. But the texts they send for translation are abusive in the extreme: Microsoft Word files generated by some sort of program on a host system, with a bizarre mix of colors and font changes (both type and size), as well as lots of superfluous line breaks and carriage returns. I presume the thought for the latter is to avoid overlapping graphics, but since text wrap is turned on for the graphics anyway, I don't see the point. What I do see is horrible German sentences horribly mutilated into as many as five or six chunks, but at least two or three most of the time. A real crime.

And did I mention that segments break at the color and font changes even for sentences which appear intact? No CAT guru has ever been able to figure that one out.

One such horror revisited me last week, and I put it off as long as I could. Finally, I got to work at the point where the deadline was very much in doubt, and as an afterthought I did something I usually forget about: I pretranslated the file. I applied the "TM-driven segmentation feature", which is not considered in the file analysis. To my amazement, most of the file pretranslated with matches over 95%. When the remaining empty segments were examined and 4 or more parts joined to make a sentence, most were 99% matches. I had completely forgotten that I had translated this material a year ago. And the agency was unaware of that as well, because they rely on traditional Trados methods for file analysis and processing. What I thought was going to be a very hard slog through about 500 horrible segments turned out to be a bit of tag tweaking and a few sentences of updating the text.

This is part of what my agency friends who have gone over to memoQ mean when they talk about improved leverage over time from legacy resources.

To demonstrate how this works, I took a bit of text on "technical terminology" from Wikipedia and prepared it as a text with coherent sentences and also as a text with lots of superfluous carriage returns like one might find with text copied from a PDF file, for example:

I translated the file with intact sentences in memoQ, then ran an analysis using the Operations > Statistics... function:

The file's segments looked like this in memoQ:

Then the file was pretranslated using the TM-driven segmentation option:


This was the result:

The exclamation marks indicate missing tags, which may cause problems. In cases like this I usually insert them at the end and clean up the spacing in the output target file. And if I send a TMX to someone I clean the crap tags out of it with a search and replace operation in a text editor.

To satisfy my curiosity I then deleted the contents of the TM and made a new source file with a couple of broken segments:

Note the lesser quality of what will be going to the TM. This is the diet Trados users have enjoyed for a long time or, for that matter, what anyone who uses a CAT tool without the ability or knowledge to join segments may swallow routinely. After that was sent to the TM, I re-translated the file with intact sentences:

In Segment 1, a split was made, but no pretranslation was done of the fragment (even though it was in the TM as "101%"). In Segment 4 the sentence was not split but instead taken as a fuzzy match. The information pane at the right of the translation window shows the differences with the TM information:

I am not disturbed by the more restrained matching when splits are involved. I consider it a good thing, a feature which encourages users to wean themselves off the bad practice of "translating" text which has been impossibly chopped up. Smart translators use the functions for segment joining and splitting frequently in a good CAT tool, and with memoQ this habit is rewarded particularly.