Jan 31, 2014

Re-importing reviewed translations in memoQ server projects.

One of the unexpected benefits of testing the memoQ cloud server is that it gives me a good opportunity to reproduce and test some of the disaster scenarios encountered when working with project managers not fully aware of the implications of their choices when setting up server projects. Many of the problems that come to my attention relate to revision workflows that many experienced translators like to use.

For various reasons, exporting bilingual formats - XLIFF, Wordfast Classic-compatible DOC or RTF tables - is a popular review method. Sometimes these are checked by others who do not use memoQ, sometimes they are convenient for QA with third-party tools or have other perceived advantages. As far as I know, translators can always do bilingual exports from a local installation of memoQ connected to a server project. (I haven't looked for ways to block this, because I find the notion of doing so extremely counterproductive.)

The trouble comes when they want to re-import the corrected and/or commented bilingual file to update the translation. This is possible only by the project manager working in the management window. There's no way for the translator to import a bilingual reviewed document. I asked Kilgray Support about this and was told that this is intentional because of the difficulties which could result in the project. So basically if you edit a bilingual, someone with project manager privileges for that project has to re-import it for you.

Well, not always. Sometimes it works, just a bit differently than one might imagine.


If the project manager sets up the project to use "desktop documents" (as opposed to "server documents"), then it is possible to export bilingual files and re-import them. This cannot be done directly with documents in the Translation list. But it will work with Views of these documents. Or with the full bilingual exports of the documents themselves!


The screenshot above is from a server project with configure for desktop documents. For these two project types (with or without web translation enabled), when working from a memoQ desktop client you are able to import any bilingual to update a translation file from this interface.

But wait, that's not all!

or is it? That command says "Import" and so devious minds might wonder if it is possible to import something other than a bilingual export from one of your translation documents. Indeed, the dialog that appears for file selection tantalizingly offers all supported formats. So I grabbed a DOCX with a financial text and gave it a try:


SWEET SUCCESS! A mere translator, I've cracked the memoQ server and uploaded another document to my project. Visions of Caribbean beach vacations in the warm sun dance through my head as I contemplate all the extra work I can upload to certain client projects and bill because it is, well, right their on that server project they assigned to me.... then I get this message:

General error.
TYPE:
System.NullReferenceException

MESSAGE:
Object reference not set to an instance of an object.

SOURCE:
MemoQ.Project

CALL STACK:
   at MemoQ.Project.ProjectDocument.TranslationDocumentProjectContext.UpdateDocumentDivisionInfos(TranslationDocumentCore doc)
   at MemoQ.Translation.Storage.SqlCeStorageService.SaveDocument(TranslationDocumentCore document, SavePreferences savePref)
   at MemoQ.Translation.Storage.SqlCeStorageService.SaveDocumentAndAllInfos(TranslationDocumentCore document, ICompactSerializable workflowInfo, ICompactSerializable tagDefinitions, ICompactSerializable lqaModel, SavePreferences savePref)
   at MemoQ.Project.TranslationDocImportExport.LocalImportController.doImportOrReimport(ImportTask importTask, String targetLangCode, String docStorageDir, Boolean reimport)
   at MemoQ.Project.TranslationDocImportExport.LocalImportController.DoJob()


That's Geek for "Nice try, buddy... an automated report has just been sent to the NSA and our agents will be at your door shortly." I click again, and that message self-destructs and I am given a second warning:


A knock on the door, then after a stern interview, I sink back into my desk chair and click Continue. Next time I'll stick to importing bilingual exports of documents or views from the project. That works beautifully from the View tab and allows me to work as I prefer, since by now most of my clients with memoQ servers know to use the desktop documents options for my projects. Perhaps in the future, Kilgray's programmers might tighten up the code to trap errors from fools like me who do the unexpected.



But of course, as of memoQ 2013 R2, when translating with the desktop client in server projects, one can usually update a translation with minor edits using the reviewed monolingual target document and the Import reviewed document command in the Translations menu of the project. This won't let you bring in comments, and it does have some (but increasingly fewer) quirks, but in many cases it works quite nicely. A video demonstration of this feature can be seen here.

Jan 29, 2014

Finding resources on Kilgray's Language Terminal

Kilgray’s online platform for translation, Language Terminal at https://www.languageterminal.com/, may be a game-changer in many ways. Not only does it offer affordable, on-demand memoQ translation server capacity for small teams on demand, it provides free InDesign server availability to users of any tool for converting InDesign formats to XLIFF and PDF for translation and review, back-up features fully integrated with recent versions of memoQ, some evolving project management and invoicing tools and a growing library of light resources shared by users. This post discusses how to find and use these resources, which can be useful in all supported versions of memoQ.

Accessing your account
The user menus of Language Terminal can be accessed in two ways: in a web browser from the URL above or from the link on your memoQ Dashboard.


If you are not already a Language Terminal user, a free account can be set up in just a few minutes.

Looking for resources
The current user interface for finding resources on Language Terminal is confusing to some users. The Resource menu link in the orange navigation bar shows a list of resources you have uploaded yourself to Language Terminal. The dropdown list indicated by the arrow filters your own resources. To find resources from other people, click the Advanced Search button.


There is nothing “advanced” about this search. It simply allows you to use four fields to find resources which are publicly available on the site. Be careful of your selection criteria for language as some resources (like auto-translation rules) are not language-specific by definition even if they might have been created for use with a particular language.


The result of the search for English stopword resources to be used in terminology extraction to filter out “noise” words (like prepositions, pronouns, articles and common vocabulary) looked like this at the time I performed the search:


Download the resources you want by clicking on their names in the Resource column. The shared library of filters, QA profiles, auto-translation rules, stopword lists and more on Language Terminal continues to grow. Why not contribute something yourself?

In any case, Language Terminal is a useful place to archive one’s valuable light resources, such as segmentation rules developed over time with great effort, and these are not shared with others unless you specifically release them. Given the occasional unfortunate “disappearances” of light resources known to occur with some memoQ upgrades, this is a very useful backup option to have, and it would be nice if future integration of Language Terminal and memoQ were to facilitate more complete, automated resource backups from desktop systems.

Jan 28, 2014

On a role with memoQ!

This morning I finished off a job I shared with another colleague, sent him the target document and a memoQ MQXLZ file for any edits he might want to make before dumping it in his TM or LiveDocs archive. I went on to other things and didn't see his message until about six hours later. He could not edit my bilingual file! What was wrong?!

I was "on a role" you might say. But not on a roll. Not today. I had not stopped to think that although the method I had used to review my work is popular with some freelance translators using memoQ, most of us do not use roles in our own projects, and few understand what they actually do. But roles can be an effective addition to our workflows and enable us to keep a better overview of a translation's status.


When working in a local memoQ project, you can assume one of three roles at a time: translator, first reviewer or second reviewer. The default role is always "translator".

Each role applies a different status when segments in the translation grid are confirmed:
  • Translator = Confirmed
  • Reviewer 1 = Reviewer 1 confirmed
  • Reviewer 2 = Proofread
The confirmation settings you choose under Project home > Settings determine which of these three roles you work in. You can also set your role in recent versions of memoQ using the dropdown menu in the far right section of the translation window's toolbar. The icon for the dropdown menu reflects your current role:


When segments are confirmed in different roles, the status icon will be different:


The screenshot above shows four segments confirmed with three different roles. Someone in the Translator role or Reviewer 1 role cannot change segments which have been confirmed by someone working in the Reviewer 2 role. These "proofread" segments are protected and can only be modified in the Reviewer 2 role. However, the Translator role can change unwanted edits made in the Reviewer 1 role.

What use is that to a translator working alone? For a long time, my instinct was to say "none at all" and think of this as a feature of interest only to team processes. Too often I'm a team of one these days.

But on a number of jobs lately, I've found myself scribbling Post-It notes about which parts of a long job I had already checked, and it occurred to me that these roles might help. I've also grown fond of the X-Translate feature for new source text versions, where I usually protect content which has been carefully reviewed before if it has not changed. So the idea of doing a "final review" of some sections and protecting them by confirming in the Reviewer 2 role was appealing.

So, like some others, I have begun to switch roles in memoQ when I am carrying out different tasks. And the different confirmation icons - the check, check+ and double-check - give me a simple visual clue as to where I have worked with what purpose.

Another nice feature I use a lot is the Row History in the context menu. It shows me the translation of that segment for each minor version. If I am making a lot of edits and want to remember a particular translation, I use Operations > Create Snapshot... to make a new minor version for later reference. Minor versions are also created automatically with various operations such as target text export and exporting bilinguals. The row history shows all versions of the translation for the current source text (major version):


Text of previous translations can be copied to restore using Ctrl+C. Unfortunately, despite  requests, Kilgray has not yet seen fit to do the obvious and add a button for conveniently inserting other versions of a translation in the row history.

So what about my frustrated colleague who couldn't update the text in my "proofread" bilingual? How can he, a mere Translator, override my work in a superior role? By giving himself a "promotion" and choosing the Reviewer 2 role from the toolbar or his confirmation status in the project settings to Proofread.

For a discussion of how roles can affect what is written to a TM, see this old article on memoQ TM settings. The defaults have now been changed so that roles are no longer automatically stored in the TM, so if you are working with the default TM settings for memoQ 2013 R2 Build 53 or later, you don't have to worry about changing the defaults as I discuss in the old article.

Jan 26, 2014

Medical translation webinar: hematology and immunology

On February 6th, Dr. Helen Genevier, a degreed specialist in applied immunology and freelance French to English biomedical translator, will present a 1-hour webinar on hematology and immunology as part of eCPD's medical translation series. Although there will be specific examples of potential pitfalls in French to English translation, the emphasis of the presentation is on understanding  the relevant science and English terminology. This is an excellent opportunity for those working with medical texts to learn from an expert scientist and translator in this specialty. Further details on the talk, Dr. Genevier's background and registration are available here; a current schedule for the full eCPD program can be seen by clicking the logo at the right.

Jan 23, 2014

memoQ auto-translation regex blues

Can you write a regular expression that matches each of the three character sequences marked in green below?
abcdefg
abcde
abc
That's how one interactive tutorial I found start off teaching regular expressions. I found that a bit puzzling, then noticed that the right side of the web page offered a list of notes on "regex" expressions. I've spent too much time in the last two weeks trying to sort out a variety of memoQ auto-translation rules with regular expressions, so the answer came quickly and I typed it in the text field. Then I looked again and realized there were a few alternatives that would work. So I tested them. And then a few more. Various expressions that would work include
[a-z]+
[a-g]+
[\w]*
[abcdefg]*
.+
\D+
and quite a few others. But which is correct? As with word choice in translation, that depends on context, and that's where it gets hard.

Those who have delved into the configurations of various translation environment tools such as SDL Trados Studio or memoQ have seen that regular expressions are used in different ways to identify patterns in information and then filter or transform that information based on those patterns.

Although there is great power in regular expressions, I see their current role in memoQ configuration as more of a liability as far as the average user is concerned. Regular expressions are currently part of memoQ configuration for at least two import/conversion filters (a text filter and a tagger), segmentation rules and "auto-translation". I find the last feature particularly troublesome in its current form.

There are many misunderstandings about what auto-translation is in memoQ. It has nothing whatsoever to do with machine translation, which some propagandists prefer to call "automatic translation" to gloss over the many difficulties it can cause. Nor is it part of the pre-translation feature, though it can be applied in pre-translation to deal with things like catalog numbers, dates and figures in tables rather efficiently.

Auto-translation in memoQ is used to convert certain patterned information into the format needed in translation. In monolingual editing projects, it can also be used to unify the formatting used for things like dates and currency expressions.

memoQ ships with a number of standard rule sets for number conversions. The "English group" that I use consists of ten different rules for converting most of the screwy number formats I encounter with separators for decimals (periods or commas) and 3-order magnitude groups (thousands, millions, billions, etc.) that might be grouped with spaces, apostrophes, periods or something else. The rule for converting hundreds of millions with decimal fractions to my usual preference is:

(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{1,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))

with the replacement rule $2,$3,$4.$5

Pretty damned intimidating for most of us. In fact, most of the people who grasp the basics of regex syntax will scratch their heads over the number assignments of the groups until they realize that conditionals in parentheses (like (?:\.|,)) don't count. Nobody points that out in any tutorial I've read.

If I suggested to most of my esteemed colleagues that they really need to learn this stuff, I think most juries in the civilized world would refuse to convict them of murdering me for the mental cruelty inflicted. There are brilliant translation technology consultants like Marek Pawelec, who eat stuff like this with their breakfast cereal and are a priceless resource to colleagues and corporate clients who need their expertise... and then there are the rest of us.

Kilgray CEO István Lengyel told me recently that there are plans to expand the examples shipped with memoQ later this year to include some reformatting for certain date structures and other information. Language Terminal has a few examples of useful conversion rules for dates, unusual number formats, e-mail addresses and more. You don't need to know regex to use these, just how to download the MQRES files and click Import in one of the memoQ modules for managing auto-translation rules to bring these into your set-up. Once there they can also be used in QA checks.

I think it would be nice if Kilgray or someone more expert than yours truly would produce a "recipe book" with clear documentation of examples so that users of more limited skill (like me) can adapt these examples to their specific needs. Some typical uses I see (some of which have eaten my evenings recently) are
  • stripping or adding spaces from numbers with percent signs, such as 37,5 % <> 37.5%
  • formatting other numbers with units according to a preferred convention, such as 1,2A >> 1.2 A
  • currency expression reformatting like
          TEUR 1.350 >> EUR 1,350 thousand
          34.664,45 €  >> €34,664.45
          € 1,2 Mrd.  >> €1.2 billion
          & cetera
  • Legal references such as
          § 15 Abs. 1 Nr. 3 GebrMG
          § 15(1) Nr. 3 GebrMG
          § 9 S. 2 Nr. 1 PatG
  • page designations and other elements often found in bibliographies and easily overlooked (for QA purposes)
  • conversion of dates like 23.05.67 to
          23 May 1967
          May 23, 1967
          1967-05-23
              or whatever other format one prefers, with or without non-breaking spaces
  • Lovely EU legislation designations like 93/42/EWG >> 93/42/EEC
  • Telephone number reformatting such as (0211) 45 66 - 500 >> +49 211 4566-500
Some might wonder about that last conversion with the two-digit years. As a veteran of the Y2K scam, I'm fond of two-digit years; they were part of my ticket over the Atlantic. The implementation of regular expressions in memoQ allows for the use of custom reference lists, which are delimited by hash symbols (#) in the expressions. So when I created my rule for converting those two-digit years in dates
(\d{1,2})\s?\.\s?(#month-num-to-text#)\s?\.\s?(#21st-Century-2digit#)  and
(\d{1,2})\s?\.\s?(#month-num-to-text#)\s?\.\s?(\d{2})

with the replacement rules $1 $2 20$3 and $1 $2 19$3 respectively,
I used two custom lists, one with translation pairs like 01 = Jan. and the other with the two-digit years I felt like assigning to the 21st century (i.e. the ones I am most likely to encounter that do, like 00 through 19; I'll have to adjust this in some cases).

In the case of my two-digit year conversion rules, the rule order is important. The conversion will not work as planned if the rules appear in reverse order. THIS IS A MAJOR PROBLEM with the current rules editor for memoQ auto-translatables. Each time a rule is edited, it goes to the bottom of the list. I'm currently working on a complex set of about 20 rules for converting financial expressions, and rule order is critical for several subgroups of rules (this was disputed after I originally made this post and I backed down... however, subsequent tests have proven that rule order is indeed critical!!!). So editing them in memoQ is a nightmare-. (one expert told me how he generates a basic rule set in memoQ, exports it and does all further rule editing in Notepad++, which also allows him to keep better track of his work with comments.) Some current problems with the edit dialog for auto-translation rules in memoQ are
  • the need for "order stability" for rules being edited to maintain grouping (for a better overview and the arrow buttons to move rules up and down easily for better grouping
  • insufficient field width/height and bad scrolling behavior, so that it is very difficult to edit long expressions - usually have to paste them into Notepad to keep an overview while I work
  • strange, severe bugs in the test window, so the rule results shown are sometimes not accurate; I deal with this by adding a test document with sample data to the project and looking at what the rules do with that text
  • helpful <!-- comments inserted in the rules --> to explain them disappear when MQRES files are imported, and there is no way to maintain explanatory comments to keep track of one's own work in the editor (except for the pitifully limited comment box in the resource properties dialog)
A few people I've mentioned auto-translatables to have said to me that they would have no use for them, because they use dictation software. I do that myself, but I find that in many cases (like for those legal references) it is nicer to have a rule configured to client preferences, so I can use  single keystroke to insert
Section 55(2) No. 3 Sentence 2
Section 55 Paragraph 2 No. 3 Sentence 2  or
Sect. 55(2) No. 3 S. 2
as the job calls for. And run a QA check to confirm that I have formatted consistently:


Dominique Pivard posted a nice video about memoQ auto-translatables on his "vlog" a while ago. It's worth a look if you want to see how to create these resources and see a good demonstration of how they work.

Jan 22, 2014

memoQ cloud: a team server "on tap"

This afternoon, Kilgray CEO István Lengyel held one of the best webinars I've seen him do yet to describe the convenient new hosted server facilities known as memoQ cloud, which I reviewed recently.

In the webinar, he explained the company's evolution of thought for online computing and how concerns about security were finally resolved to create a more sustainable offering than the more support-intensive "honeymoon" server solution.


He made it clear how existing desktop licenses for the Project Manager and Translator Pro editions can be used in combination with concurrent access licenses (CALs) for the server, as well as how cloud services can be suspended for periods in which they are not needed, saving considerable costs for those with only occasional needs to work in a coordinated online team.


Backing up the server configuration can be done quickly and easily from a Language Terminal account, so if cloud service is dormant for more than three months (after which data are deleted from the server), everything can be restored quickly when needed.

The webinar also included a demonstration of the integrated translation in web browsers, memoQ WebTrans. This is one way of providing access to the server for others who do not have installed copies of memoQ or working on your server when using other computers. Of course this interface also works in web browsers under other operating systems, such as MacOS or Linux. (Click on the graphic below to get a full-sized view of the web translation interface.)


Access to Kilgray's premium terminology server qTerm and memoQ server APIs is also available for an additional subscription fee. Subscribed services can be changed at any time as your needs evolve.

In the webinar, István showed how in about the same time it takes to enjoy a cup of coffee, one can get a free Kilgray Language Terminal account and register with a credit card for a month's trial of the memoQ cloud server (with any services available) for just €1/$1. If you are trying out services which you will not want beyond the trial period (like the API, qTerm or extra licenses), these can be set to cancel at the end of the trial period to avoid unwanted charges.

The embedded video below is a 20-minute tour of how simple it is to set up and manage projects in memoQ cloud. Use the icon at the lower right of the video frame to watch this on your full screen.


This is a good overview of the process, although the licenses aren't explained very well, and the project type recommendation is bad advice in many cases, as I pointed out in my post on server projects on segmentation and projects with desktop documents. Everything else in the video is good, but it's often very important to allow segmentation to be changed or corrected, particularly if the segmentation rules used in the project do not cover abbreviations which may split sentences in very unfortunate ways. If you need to have instantaneous access to work from other team members by using online documents, the the segmentation will need to be checked very carefully and corrected before the project begins to avoid difficulties.

Those testing the memoQ cloud server or using desktop editions of memoQ may also want to check out various free configuration resources on Language Terminal. These include special QA profiles, AutoCorrect files, import filters that are not part of the shipping product and auto-translation rules for easier translation of number and date formats, etc. Language Terminal offers other facilities which may be of interest even to those who do not use memoQ, such as the free InDesign server, which can create PDF previews of InDesign documents (very useful for reviews before delivery) or convert InDesign files of any type to XLIFF for translation in many different environments.

UPDATE:
The memoQ cloud webinar is now available to watch on Kilgray's page for recorded webinars; it can be accessed directly here or viewed in the embedded video below.

Jan 15, 2014

A ferramenta de pesquisa, Memória de Tradução, do memoQ

Esta funcionalidade permite usar as TM do memoQ para procurar texto numa outra janela, por exemplo, num documento do Microsoft Word ou de uma outra ferramenta de Tradução. Também é possível utilizar esta funcionalidade em ambientes de tradução que não tenham memórias de tradução, ou cujo acesso a estas esteja restrito.


À medida que copia o texto da sua janela de trabalho, o conteúdo que está na sua área de transferência é automaticamente transferido para a janela de pesquisa e as combinações aparecem.

Ctrl+Shift+Q     Inicia a ferramenta de pesquisa da memoria de tradução do memoQ, que vai, imediatamente, procurar por qualquer texto na área de transferência do Windows.
Ctrl+C     Copia o novo texto para a janela de pesquisa da memória de tradução, e executa uma procura.
Ctrl+Alt+C     Copia, o texto de destino de uma correspondência selecionada, para a área de transferência.
Ctrl+Shift+C     Copia, o texto de origem de uma correspondência selecionada, para a área de transferência.
Ctrl+V     Cola o texto da área de transferência noutra aplicação.

Na ferramenta de pesquisa da TM pode selecionar qualquer uma das suas memórias de tradução para usar noutras aplicações. No menu Configurações de Pesquisa, desta janela, também é possível definir outros parâmetros, tais como, a percentagem mínima para que existam correspondências, as penalidades para com os alinhamentos ou o rigor a ter para com as tags existentes.

Uma memória de tradução selecionada na ferramenta de pesquisa da TM não pode ser usada nem aberta no memoQ, enquanto a ferramenta de pesquisa estiver ativa, nem a ferramenta de pesquisa consegue aceder à TM que está associada a um projecto em aberto no memoQ. Um raio laranja é exibido na lista das memórias de tradução para indicar esta condição. Depois da ferramenta de pesquisa estar fechada, a memória de tradução ficará disponível novamente para ser usada no memoQ. E, quando fechar o projeto, a ou as TM correspondentes ficarão outra vez acessíveis para a ferramenta de pesquisa.

A ferramenta de pesquisa também pode ser útil como uma segunda concordância no memoQ, para procurar um conjunto em particular de memórias de tradução, que não estejam associadas ao projeto em aberto.

*****

Excerpt from memoQ em Pequenos Passos, the Portuguese version of the second edition of my memoQ tips book. Translated by Cátea Caleço Murta.

Jan 14, 2014

The virtue of virtual machines for translation

The year 2014 started for me with new stationary working hardware and plans for using it via remote access as some of my clever colleagues and friends have already done for years with their high-powered desktop systems. For more than a decade I've worked on laptop computers, but a RAM-loaded tower with solid state drives just makes more sense now for the kinds of work I do.

The new configuration is particularly well-suited for using virtual machines, such as those with can be created with VirtualBox or VMware. I have used VMware for more than a decade now, mostly as a way to continue using ancient translation dictionaries which cannot be run under newer versions of the Windows operating system, but also as a way of using some Linux tools on a machine that is otherwise configured for other operating systems. This time I'm going a little further and using a VMware configuration to quarantine SDL Trados Studio 2014 and keep its current difficulties with Java from affecting the rest of my system. I wish I had done this with earlier versions of SDL Trados, and I paid the price of stupidity often enough by getting my Microsoft Office installations screwed up every time.

Virtual machines also solve another problem I face when documenting workflow solutions for memoQ and comparing them in different versions. It's rather a nuisance to close one version and open another simply to look at a dialog or make a simple screenshot; now I can simply launch the older version of the software in the window for a virtual machine. Time saved. Less stress.

Although I do have an old VMware Workstation license, I'm now using version 6.0 of the free VMware Player, which is able to create its own virtual machines. The free converter from VMware also enables me to create virtual machines from my old laptop and netbook configurations. I also plan to try Windows XP Mode under Windows Virtual PC for some applications.

What are the potential benefits for other translators to use virtual machine solutions?
  • Continued use of older dictionaries or software versions for which updates may not be available or needed
  • Safe testing and/or isolation of new software or upgrades you don't trust
  • Use of other operating systems or software versions for documentation. When I did a lot of software manual translation years ago, I used several virtual machines with different versions of Windows in various languages to keep track of differences in system paths, etc.
  • Access to tools or other resources available only for another operating system.
Shared folders allow easy passing of data between virtual machines and the host system (your main computer configuration). There are probably a number of other benefits; I've only listed the ones I have made use of many times over the past 13 years and will use to a greater extent once again.

Jan 13, 2014

Locking out other languages in memoQ source texts

One of the interesting and useful results of Kilgray introducing document language recognition features in memoQ 2013 R2 is the ability to identify and exclude segments in other languages. I see this sort of thing from time to time in German patent dispute documents which quote English patent texts extensively or in texts to translate where new source language material may have been added to an existing translation. In the past, I prepared such texts for translation by hiding the text which is already in the target language or is in a language I cannot translate (such as French) or I locked it manually, which can be time-consuming to do in a long text. Now the task of preparing such tasks for translation is a little easier.


The screenshot above shows a patchwork document with German and English. The hundreds of segments in this job were a wild mix of the two languages with unfortunately few coherent blocks of the source language (German). To save time in preparation, I selected the option in the Operations menu to lock the segments:



The result of the locking procedure looked like this:


Most of the English segments were copied source to target and locked. The differentiation of languages is performed using statistics and is rather good but not perfect. In slightly under 400 segments, there were 5 or 6 that were not correctly identified and locked. Several of these were in the bibliography and consisted of  long string of names and one or two short English words or abbreviations. I saw no false positives (source language misidentified and locked), though I did hear a report of some from another translator working from Dutch to English with a very large mixed document. Discussions with Kilgray Support revealed that a "failure rate" of about 1-2% may be experienced for this feature.

So what good is it? A lot, really. It enabled me to do a quick estimate of effort and separate the two languages so I could make a reasonable assessment of the separate efforts for proofreading the English and translating the German. Obviously, if I were a project manager preparing  file for somebody else to translate, I would need to do manual checking of the segments to correct any errors of identification. But this feature would still often save me a great deal of time in preparing the file, an manual checking is important to do anyway to ensure that there are no segmentation problems which may cause difficulties in translation.

Do you work with mixed language documents where this feature might be relevant? If you do, have you tried this yet? What has your experience been with your language pair(s)?

Jan 10, 2014

memoQ AutoCorrect update & MS Word export macro

Last summer I wrote about autocorrection of text in memoQ and offered an indexed embedding of a video I created to give an overview of the AutoCorrect functions in memoQ 2013. There have been a few enhancements since then in memoQ 2013 R2; where only "smart quote" toggling was possible before there are now various options for correcting accidental miscapitalization.

I've also been looking to optimize the procedure for migrating the Microsoft Word autocorrection lists to memoQ. There are a number of problems with using the table-generating macro that Kilgray suggests in the knowledgebase article on using MS Word 2003 autocorrect data; when I created a 17,000 entry list from a large AutoCorrect file for one language, it was nearly impossible to do anything with it because of memory problems. The following macro, which could be put into the Normal template in MS Word, should be a little easier to work with:
Sub BuildAutoCorrectList()
  Dim ACE As AutoCorrectEntry
  ' Create new document.
  Documents.Add
  ' Iterate through AutoCorrect entries.
  For Each ACE In Application.AutoCorrect.Entries
    ' Insert each entry name and its value on a new line.
    Selection.TypeText ACE.Name & vbTab & ACE.Value & vbCr
  Next
End Sub
Invoke the macros dialog in MS Word with Alt+F8. Select the Normal.dot or Normal.dotm file (depending on your version of MS Office) from the dropdown list, enter the name of the new macro and click the Create button. Then paste in the code above. When the macro is run, it will create a new document with the autocorrection list in tab-delimited text. To bring the list into memoQ, you'll have to
  1. Paste in the XML header needed by the "light resource" for AutoCorrect lists in memoQ. You can see what this looks like for the language setting you want by creating a dummy resource, exporting it and opening the file with a text editor. European Spanish might look like this, for example:
    <MemoQResource ResourceType="AutoCorrect" Version="1.0">
      <Resource>
        <Guid>6d61e3bc-da00-4cb8-a4f3-93c980543bba</Guid>
        <FileName>spa-ES#EU Spanish AutoCorrect.mqres</FileName>
        <Name>European Spanish</Name>
        <Description />
        <Language>spa-ES</Language>
      </Resource>
    </MemoQResource>
     
  2. Save the file as plain text with UTF-8 encoding.
  3. Change the file extension to "*.mqres"
  4. Import the resource to memoQ.
AutoCorrect lists which are language-neutral (for example, lists of company names) use "all#" in the name and "Neutral" between the tags.

Other sources for autocorrection data
With a bit of searching, one can find other sources of data to add to AutoCorrect resources for various language. Wikipedia, for example, offers lists of commonly misspelled words, such as this one in English, which includes links to Dutch, Hungarian, Portuguese, Spanish and Turkish lists. The structure of the data lends itself easily to reformatting with the search and replace features of a text editor:
alamanya->almanya
aferim->aferin
agrasif->agresif
ağostos->ağustos
ahret->ahiret
ayle->aile
alarım->alarm
atmış->altmış
Copy the data from the Wikipedia page to a text file. Then use search and replace to substitute tabs for the "->" structures, add an appropriate XML header for the memoQ resource and save the file as UTF-8 with an MQRES extension and you have an AutoCorrect list ready for import to memoQ. An example of the Turkish list converted and ready for use in memoQ is available for download here.

For German, there is a list of common spelling errors on Wikipedia which can be adapted with very little effort to make this resource.

The English list on the Oxford Dictionaries page can also be adapted without much ado. And there are many others to be found on the Internet.

Merging memoQ AutoCorrect resources
Entries from multiple AutoCorrect lists can be combined in a single tab-delimited file, and duplicates can be removed using Microsoft Excel, for example.

The screenshot above shows a merged German AutoCorrect list opened in Excel. When using the Remove Duplicates function on the Data ribbon, be sure that only Column A is selected in the dialog:


The reason Column B must not be selected is that it contains the desired text after correction, and there may be more than one error entry for a particular word.

After duplicates have been removed from the list, save the file as Unicode text, then import it to memoQ. A similar procedure with Excel may be followed to maintain other memoQ light resources; I do this rather frequently for segmentation exceptions to ensure that the lists for the different language variants I work with remain synchronized. (It would be nice, of course, if Kilgray would create a reasonable light resource manager with such capabilities. It gets tiring to do this so often with stopword lists and other resources.)

Jan 9, 2014

Games agencies play, part 3: blind commitments

Many of us know this late-afternoon scenario:
Dear Mr. Lossner,
We have a translation request for 3 PDF documents with a total of 2500 words to be translated into English. Please let me know quickly whether you can deliver by noon tomorrow.
My response to something like this is usually something to the effect of "What the Hell are you actually asking?" I always smell a trap of some kind, because usually there is one. I've been there many times with this particular project manager. You would think that after several years he would understand after several years of playing this game together that I do need to know what the task really is before I can respond in any reasonable way. But unlike other business partners of mine whom I can trust to make a reasonable assessment of effort and work out reasonable conditions in advance with an end client, this guy is one of those who lives on a wing and a prayer and the hope that some sucker will promise the unreasonable. My response?
That would depend on the text and its format. Would you mind letting me have a look so I can answer your question?
Shortly thereafter I received the three documents. I've dealt with some idiots who will declare something to be too confidential to assign without a commitment, but I tell those jokers to go straight to the Hot Place where they belong. This one will at least trust me to look at something before I can tell him if I can translate it. And what did I see? A nightmare of sorts.

Document #1: a certificate with graphics and a complex layout, perhaps 30 words in total.

Document #2: an extremely complex tabular form with a cell structure that would take me perhaps an hour or two or more to reproduce the single page. No OCR program will deal adequately with that form, which included strange nested borders I don't even want to try to describe. The text in the form comprised perhaps 100 words, maybe a bit more.

Document #3: seven pages of reasonably complex layout on letterhead, with tables, footnotes and a lot of formatting in the body text. The least bad of the three, but probably at least an extra hour of editing and checking due to the formatting.

Oh yes... all of these were scanned documents with a few "shadows" and artifacts of the kind known to be troublesome for OCR.

I replied that feasibility would depend, of course, on the customer's willingness to pay the rush rate given that we were at the end of a long business day as well as the willingness to compensate me for the full effort of layout, which could possibly exceed the actual translation costs. I know from past exchanges that layout and graphics inclusion would be wanted.

The expected response arrived soon after: the inquiry had "taken care of itself". I expected as much; indeed, I guaranteed that response by mentioning rush charges, because this particular client has a policy of never applying rush charges no matter how unreasonable a deadline. Their problem, not mine - natural selection will deal with such policies in due course. I really like these people as people, but like too many "service providers" in the translation sector, they stubbornly refuse to accept principles of responsibility and sustainability, so when they tell me they have assigned a staff member to devote three months of full-time effort to a data processing task which could be accomplished in somewhat under an hour if the software and the task were properly understood, I accept their wise contradiction of my suggestion to do otherwise and just smile. And I watch the sand running in the hourglass.

I remember a Heinrich Böll story I read in high school about a crazy office manager who ran around exclaiming Es muss was geschehen! Eventually something did happen. He dropped dead at work. The unthinking, often panicked way that people like the PM with the three PDFs often do business so often reminds me of Böll's story.

"Partners" like this may be very nice people, but with their persistent refusal to provide those with whom they do business the basic, obvious information to do that business properly, they are a dangerous contagion, a risk to the health of your business. They are themselves walking dead, wandering blindly in search of an inevitable final resting place.

Jan 8, 2014

Multiple, separate concordances with memoQ

In the comments of my recent post on the memoQ TM search tool, I mentioned a possibility for using that feature to "de-junk" and simplify concordance searches.


In the example above, for example, I am searching the 2 million translation unit EU DGT TM using text selected in a memoQ 6.2 project. Working this way offers me the following advantages:
  • I can separate the concordances for my project from a big reference dataset I only need for certain lookups.
  • A simple copy command (Ctrl+C) automatically looks up text in either language in the TM search tool.
  • If I want to avoid any possibility of unintended "leakage" of data from certain TMs in the project, selecting them for use in the memoQ TM search tool ensures that their content will never be "accidentally" inserted as an ordinary TM match as I work.
Note also that I am using a feature of memoQ 2013 R2 (6.8) to do searches while working in an older version of memoQ. I could do the same if I were working in a web translation interface for memoQ (which does not allow me to attach my own TMs) or any other translation environment.

I remember an argument with a translation agency owner about a year and  half ago. The man told me quite insistently about his intent to force even translators with memoQ to use the web translation interface so that he could restrict them to the use of the client-specific TMs he maintained. With the use of the TM search tool, a reasonable compromise is achieved for TM data at least. (LiveDocs and termbase access remains a bit more cumbersome, however, though by setting up a dummy project with termbases, corpora and particular TMs attached, one could actually use three separate concordance sets. That could be interesting.)

In any case, the possibility of a separate concordance for handling large data volumes separately from one's main TMs and the possibility of doing this even while using older versions of memoQ may be a reason why those who do not yet want to do their routine work in the latest version or cannot do so can still benefit from upgrading now and installing the latest version alongside the old version(s).

Jan 7, 2014

Cloud 9 for memoQ teams


After the Civil War in the US, there was a saying that Abe Lincoln may have freed all men, but Sam Colt made them equal. A similar thought occurred to me regarding Kilgray when I began testing the new memoQ cloud service announced last month. This convenient, no-hassle "server on tap" really has the potential to level the field between teams of individual translators and agencies with servers.

Kilgray will hold a free webinar on January 22, 2014 for setting up and using the memoQ cloud service. I will also be creating some resources (blog posts, videos, additions to future editions of my memoQ user guide) to share my ideas for how to work effectively with this platform. At a cost of only €120/month for a project manager license, this is a very cost-effective way for teams to access server functions as needed without the heavy investment and risks of setting up and maintaining their own local server. It also promises to be a good alternative for small agencies or corporate departments with limited capital budgets or limited abilities to maintain infrastructure.

The memoQ cloud service currently requires the use of memoQ 2013 R2. Attempts to access the server for administration with an older version of memoQ will result in error messages or a notice of the version needed.


Both web-based translation (in a browser) and client/server translation are possible with this service and are determined by how a project is set up. There are some limitations in the number of licenses which can be subscribed, but this should not cause any real difficulties for a typical small team, particularly if the members have their own memoQ licenses. There is no access limit for memoQ license owners: in addition to any subscribed memoQ cloud licenses, currently any number of translators can connect if they have a valid memoQ license. (This will change at some point, but the access model is still under consideration.)


If you are curious about working with a memoQ server, you can try the service for a month with only $1/€1 charged to your credit card. There is a bit of a learning curve for setting up projects correctly for various purposes, and I expect some users may have difficulties putting together pieces of that puzzle from the various memoQ Server manuals from Kilgray, but I expect that before long there will be a number of useful guidelines for different audiences. Data on the memoQ cloud server are saved for up to three months of a "dormant" period, and if a subscription is to remain inactive for a longer period, a full backup (including all user data and projects) can be made and restored later.

Working with a server requires some different strategies than desktop-based work, and I think it will be important to emphasize some of these differences so that project mangers in the teams will understand what can be changed or added after a project is launched and what cannot as well as what alternative approaches are available. Adding a LiveDocs corpus on the server to a project is one such case.

A little different: the server project management window
Translators working with memoQ cloud or any other memoQ 2013 R2 server may also be a little disoriented by changes required in their revision workflows. If a bilingual file of some kind is exported for external review in a server project, it must be re-imported in the server project management window for the project by someone with corresponding rights. The new monolingual option for importing reviewed documents may provide a practical way to revise text externally in some cases and re-import the changes to the project, but this new feature is not trouble-free in all cases and thus should be used with great care.

On the whole, I am extremely encouraged that Kilgray has offered this new service. I have been asking for something very much like this for nearly five years, and what's on offer here greatly exceeds my expectations. I have a few questions about administrative details which I need to ask Kilgray, but from a technical standpoint I really see this as a best case for the company's software as a service.

Jan 4, 2014

TeamViewer and Dragon Naturally Speaking: currently a bad mix

On New Year's Eve I took delivery of new hardware to support my translation work. This will be the first time in more than a decade that the bulk of my work will not be done on a laptop, but the demands I've put on my hardware in recent years are a bit much for any laptop I'm willing to invest in. Now set with 32 GB RAM, a few SSD drives, souped-up video and other features to make my work go with a bit less hassle, I decided it was time to try the remote access solutions that some of my friends and colleagues have relied on for the past few years. I've been particularly impressed with what one of them does running all the applications on his home system with excellent performance from his desk at work or other remote locations. At last I am ready to do the same.

The new dream machine is still being configured, but I've got memoQ and other useful tools loaded, even SDL Trados Studio 2014 carefully isolated in a well-configured VMware machine to avoid trashing my main system as SDL software always has in the past.

There are, of course, many possibilities for remote access. Because I use TeamViewer sometimes for remote assistance to clients and colleagues and impromptu mini-webinars of an informal nature, I thought I would try the new, improved access in version 9 that one colleague mentioned. Things have looked quite good on the whole.

The only major failure I have experienced has been with voice recognition. I use Dragon Naturally Speaking sometimes for my translation work, and out of curiosity I decided to try it with a text I had in SDL Trados Studio on the virtual machine on the remote computer. Typing worked just fine in this configuration.

Dictation with DNS was another matter altogether. Sentences were not capitalized at the beginning, and small pauses in my voice caused spaces to be dropped in the text on the remote virtual machine. Now I know that even Trados isn't this bad with dictation, so I repeated the test in a simple word processor on the VM and repeated it in the same word processor in the remote host system. In each case, the problem was the same: failures to capitalize the beginning of sentences and frequent dropped spaces. I had to discipline myself to speak capitalization commands and insert spaces by voice after any pause. Editing by voice was also impossible and had to be done manually with the mouse and keyboard. Word accuracy was as good as ever, but that's not surprising, as that processing all occurs locally. The difficulties are in transmission to the remote system.

I suspect this is a problem to be addressed by TeamViewer rather than Nuance. I am very curious to see whether other remote access solutions have similar difficulties. If anyone else has relevant experience with this, please share it.

Jan 2, 2014

Der Spiegel guts its online English publication


Sparschwein took on a new meaning in recent days as the German publication Der Spiegel announced drastic cuts in the English edition of its Spiegel Online publication. Personally, this is a great disappointment, because Spiegel is one of my two favorite German news brands, the other being Die Zeit.

Although I usually read Der Spiegel in the original German, I've followed the English edition for a few years, and except for occasional poor editorial choices tending toward over-literal translation, I've been pleased with the production. The shutdown of Spiegel's English edition follows similar failures for German news organizations Welt and Bild.

With the unwillingness to pursue the development of viable, international communication platforms, it seems that German business and society may be content to live with the restricting perspectives of reporting from the outside, which often lacks a real understanding of modern Germany and the fact that some time has passed since 1945. The Spiegel Online English edition was a good cultural ambassador, and I'm sorry to see its demise.

The memoQ TM search tool

Release 2 of memoQ 2013 included a new utility which allows memoQ translation memories to be used for lookups, the TM search tool:

When working in other translation environment tools such as SDL Trados Studio or Wordfast, translating text in a word processor or reading PDF files and web pages, selected text can be looked up directly in chosen translation memories and text from the source or target of a translation match can be put in the Clipboard for pasting into the other application. Relevant keyboard shortcuts are:

Ctrl+Shift+Q     Starts the memoQ TM search tool, immediately searches for any text on the Windows Clipboard.
Ctrl+C     Copies new text to the TM search window and executes a search.
Ctrl+Alt+C     Copies the target text of a selected match to the Clipboard.
Ctrl+Shift+C     Copies the source text of a selected match to the Clipboard.
Ctrl+V     Pastes the Clipboard text into another application.

A translation memory selected in the TM search tool cannot be opened or used in memoQ while the search tool is active. An orange lightning bolt is displayed in the TM list of the Search settings to indicate this status. After the search tool is closed, the TM is available again for use in memoQ.

Although the initial version of this tool is quite useful, many users have realized that further refinements of its features would make its application more flexible and effective. Some suggestions so far include
  • selecting/deselecting all TMs
  • filtering TMs by metadata
  • saving and loading profiles (collections of particular TMs and settings)
  • indicating match sources (i.e. TM, preferably with metadata)
A number of other quirks, like the ability to launch multiple instances of the tool, also still need to be sorted out as of Build 52.

I hope that Kilgray will take the further development of this tool seriously and consider how to improve and expand it, perhaps to include remote translation memories as well. The current version of the TM search tool requires a memoQ license on the computer where it is used, but separate licensing could also be quite interesting. This could be useful, for example, in collaborative projects with partners who use different tools and working methods or for those who want to use memoQ translation memories as bilingual concordances. I see the potential for a value-added service here if I can provide such a concordance (for a fee) to an end client, perhaps with some sort of protective encapsulation for the memories provided. Inclusion of termbases and LiveDocs corpora in future versions of the tool could also prove interesting. memoQ could become a reference information packaging platform to create additional communication services for our clients. There are interesting possibilities for mobile applications here as well. But in the meantime I'll settle for the modest improvements in the bullet points above.

Further information on the memoQ search tool can be found in the Kilgray knowledgebase.

Jan 1, 2014

The 2013 translation environment tools survey

From mid-October until the end of 2013, I placed two small survey questions at the top of the blog page and publicized these in a variety of user forums. The questions were similar to two posed in 2010, because I was interested to see how things might have changed. This is, of course, an informal survey with a number of points in its "methodology" wide open to criticism, though its results are certainly more reliable than anything one can expect from the Common Sense Advisory :-) My personal interest here was to get an idea of the background readers here might have with various translation environment tools, because it is useful to know this when preparing posts on various subjects. Here is a quick graphic comparison of the 2010 and 2013 results:

Responses to the question about the number of translation environment tools were very similar in both cases. About half use only one, with between 25 and 30% of respondents using a second tool and increasingly small numbers going beyond that. The question posed covered preparation, translation and checking in projects, so some respondents using multiple tools may be translating and maintaining terminologies and translation memories in only one tool. I am encouraged by this result, as it means that despite changes in the distribution of particular tools, users are exercising good ergonomic sense and predominantly sticking to one for their main work. Everyone benefits from this: translators generally work more efficiently without tool hopping, and more effort is focused on what clients need - a good translation.

In 2010, half the respondents cited the use of some version of "SDL Trados" (more details on this were provided in a later survey); the next highest responses at just under 20% were for Déjà Vu and memoQ. Three and a half years later, Atril's share of users appears to have declined considerably, and the use of memoQ appears to be about on par with SDL Trados Studio. OmegaT, an excellent free and Open Source translation support tool capable of working with translation formats from the leading tools, appears to be doing better than many of the commercial tools in the survey, which should not surprise anyone familiar with that software.


Across continues to be a loser in every way. Despite massive efforts in the low end of the market to promote this incompatible Teutonic travesty and the availability of the client software free of charge to its victims (translators), no real progress has been made in the Drang nach Marktanteil. One would expect that a good solution supported by a competent professional development team and a marketing budget, available free to translators, would easily beat the low-profile OmegaT. And I am sure that this is the case. The case simply doesn't apply to Across, which drives some of the most technically competent translators I know completely berserk. The fact that OmegaT is about twice as popular despite its volunteer development and total lack of marketing budget speaks volumes.

More important than any of the individual figures for translation support tools are some of the implications for interoperable workflows that the numbers reveal. Most of the tools listed support XLIFF, so if you use a tool capable of exporting and reimporting translation content as XLIFF, developing an interoperable workflow for translation and review that will work with the majority of tools will probably not be that difficult. An XLIFF file from SDL Trados Studio or memoQ is usually a no-brainer for translation in Déjá Vu, OmegaT, Cafetran or Fluency, for example, and any concerns can be checked quickly with a "roundtrip test" using pseudotranslation or simply copying the source text to the target, for example.

While individual tools have largely improved in their mutual compatibility and ability to share translation and resource data, there is legitimate continuing concern about the increased use of translation servers by translation agencies and corporations with volume needs who manage their own translation processes. Jost Zetsche and I have expressed concerns in the past regarding the lack of compatibility between server platforms and various clients, though with the appropriate use of exchange formats, this can still be overcome.

The greatest challenges I have seen with server-based work is that the people creating and "managing" projects on these servers often lack a basic understanding of the processes involved, so that the skills of the translators competent with a particular client tool may be effectively nullified by an incompetently prepared job. I experienced this myself recently where segmentation, termbase rights and even the source language were set wrong on the server, and the project manager had no idea how to correct the situation. However, things worked out in the end, because I had a playbook of strategies to apply for such a case. In the end, better training and a good understanding of the interfaces to the processes our partners use can get us past most problems.