Dec 31, 2016

The Dark Secret of memoQ interface switch-hitters

The memoQ world looks very different in the typical view of a translator versus that of a project manager.

Chaos Central: the memoQ Translator Pro dashboard
Ready to work: the project manager interface of memoQ
These differences sometimes lead to difficulties in training or documenting procedures for others. It can be quite annoying, for example, for the students in a university translation technology course to watch their professor demonstrate a technique in the project manager version as they stare at their local copies of memoQ Translator Pro in the computer lab and try to repeat the steps in that different environment.

Project managers who never or seldom use the Translator Pro version often sow unintended confusion when they explain to a translator who isn't a memoQ power user how to performcertain tasks; it really would help if both were looking at the same setup.

I used to spend a lot of annoying time changing the license in my copy of memoQ when I needed to change from one edition to another for purposes of teaching or writing help instructions for the blogor elsewhere. No longer.

A few years ago, memoQ developers and testers got tired of the same problem, so Kilgray made it possible to switch easily back and forth between the edition views. The only catch, as far as I know, is that you must be using a PM license, such as the one that is included with the memoQ cloud service account.

Switch-hitting with memoQ is very simple, just a few little steps. These are:
  1. Create an XML file named ClientDevConfig.xml with the following content: 
  2. Copy this file into the path: C:\ProgramData\MemoQ. Note that ProgramData is generally a hidden folder, so you'll have to deal with that. If I have to tell you how, you probably shouldn't be doing this :-)
  3. Restart memoQ. There will be a new menu group at the far right of the ribbon menu. Perhaps you noticed that already in the screenshots above. This addition may appear in two different forms:

That's it. So if you are a memoQ trainer or a project manager who needs to explain something to a frustrated translator in a visual context they can relate to, you have another tool to help you with that. This has been a great boon to me for the past few years, and I am very glad to have it in my tattered bag of tricks.

Dec 28, 2016

Go Figure (with memoQ!)

When translating patents, legal briefs, reports, manuals and many other kinds of documents I inevitably encounter figure references to photographs and illustrations in the text as well as the labeled captions for these. In this morning's translation of a petition in a nullity suit, one such reference takes the form in Verbindung mit Figur 1,  but it might just as well appear as

Fig. 1
Fig 1
Abb. 1
Abbildung 1

in this or some other text; in documents with multiple and/or sloppy authors I might even find a mix of all these in the same text.

As I value consistency in writing even when the client might not care, I try to translate all of these to the same form in English where it makes sense to do so. That might be Figure 1 or Fig. 1 depending on the situation and the styleguide stipulated for the project.

But when I finish the 10,000 or so words for this job and need to do my final check before sending it to the client, I expect to be a little tired, and I want to use my attention and energy to focus on the accuracy and reading comfort of my translation. In doing so I tend to miss little details like the occurrence of "Fig. 1" on page 32 as opposed to "Figure 1" on the other 40 pages. That is why I use the QA feature of memoQ to check the consistency with which I have translated the figure references as well as other matters such as the accurate use of special terminology for the project.

The specific feature I use here for quality assurance is

an auto-translation rule set (aka "autotranslatables"), which is highlighted and selected in the screenshot of the project's settings above.

As I have stated many times before, autotranslatables should be used, but not created by the average translator. Aside from the fact that the regular expressions involved are not particularly easy even for most of the nerds among us, there are a lot of little subtleties that make the difference between a well-functioning rule set and annoying garbage, and even the "experts" struggle with this for sophisticated rules.

But the present example of Figure mapping is a comparatively simple case which can illustrate the principles and some of the "risks" to mere mortals.

My rule set for mapping figures from many German forms to a particular English form consists of a single rule.

All of the possibilities that I expect in German are compiled in a list, along with the English expression for each, and this translation pair list is named #figurelist# and is found on the corresponding dialog tab in the memoQ rule set editor for autotranslatables. (I usually edit rules externally in Notepad++ where I can comment them liberally, but in this case I felt no need to do so.) This named list is used as a variable in the regular expression for the rule to describe a source text match.


Jeepers. That regex for the source text looks complicated, doesn't it? Wouldn't (#figurelist#) \d+ be just as good? After all, it seems to work just fine. Well, except that the list would need a few extra entries to account for abbreviations with and without periods.

No. "(#figurelist#) \d+" is total, incompetent crap. Here are some reasons why:
  • It is more efficient to express the possibility of a period after the text for "Figure" with the regex "\.?",  because you'll never have to worry about abbreviations with or without periods in your lists. Mine will get longer, as I'll probably expand these rules to cover Portuguese as well and use the same rule for both Portuguese and German sources.
  • There may or may not be a space or even extra spaces after the Figure expression. Simply typing a standard space after the (#figurelist#) group means that it must be present and it must be an ordinary space to match. If it's missing or someone typed a non-breaking space (a reasonable thing to do to keep both parts of "Figure 1" on the same line), the rule will not work! Using \s+? to express the possibility of 0 to n spaces after "Fig." or whatever is in fact the right way to go.
  • If you test the "simple" crappy regex, you'll also find that "Abb. 14" gives to results: Figure 1 and Figure 14. That is because the rule does not stipulate that the second part must be a whole "word", so the substring match with the first character also gives a result. Bad, bad, bad. The chaos that this sort of mistake can cause with more complex rules like currency expressions used in important financial translations is frightening.
The regex for the result also appears more complex than it should be, but there is a reason behind that as well. Instead of the simple $1 $2 (first group followed by a space followed by the second group), I specified output with a non-breaking space, because it looks rather unfortunate to have a line wrap in the middle of the expression for a figure. One sees that a lot, because it's a nuisance to remember to type non-breaking spaces all the time on the keyboard. This rule can also be used to check the use of the non-breaking space; an ordinary space will generate a warning when the memoQ QA profile is run with the autotranslatables check activated.

There are many ways in which regular expression rule sets can enhance the user experience and the quality of translation results when working in memoQ. It is not hard to use these rules, but it is beyond most users to create and maintain their own rule sets. Therefore
  • Kilgray should include more useful examples of rule sets (in addition to the very helpful number rules) in future releases of memoQ
  • The average user should ask the help of Kilgray Support for simple rules they need (in most cases this would fall under the usual commitment of paid support and maintenance for the year)
  • memoQ users should work with Kilgray's Professional Services department or other competent consultants to devise robust rule sets to boost their translation and quality assurance productivity. Beware of casual advice found in forums or social media; much of it does not consider issues like the problems described above despite the aggressive insistence one might see for a particular "solution". Truly, you get what you pay for :-)

Post scriptum:
An yet ye hack by night and sun, the work of regex be never done.
Of course something was forgotten in the example here. The myriad styles and customs of source text authors will inevitably offer up challenging variants to break your well-crafted rules. Today's is a text full of figure references like Abbildung 4.12, which would refer to the twelfth figure in the fourth chapter. For this the modified rule might be 


Or perhaps not quite. Try it and you'll see a few problems. This is just another example of why it is good to make use of professional resources to help you with these challenges and to have a systematic way of recording and elaborating them. I'll explain more about such an effective system for planning and documentation in a future article. I've noticed that the "experts" in the translation field often care little for the usual standards of project specification, perhaps because they are sick and tired of translation projects with so many specification documents for those who know better.

Dec 27, 2016

Free shareable, searchable glossaries for collaboration with anyone

Some years ago I suggested a procedure using Google spreadsheets for glossary collaboration in projects. Many people do this sort of thing now.

What I do not think most are doing, however, is accessing these web-based term lists efficiently as terminology resources in their work. It's hard to compete with the efficiency of integrated termbases, TMs, web search features, etc.

... unless of course you integrate a web search for those online spreadsheets which returns just the few data of interest.

Matches found for German "ladepresse" in a glossary of a few thousand hunting terms
This is fairly straightforward using Google's visualization API with a simple query. A parameterized URL can be built to perform custom searches of your own data or data shared by colleagues or clients. "Canned" queries can be easily incorporated in custom searches from many tools, including memoQ Web Search, IntelliWebSearch and others.

Building a custom search URL for your Google spreadsheet is fairly simple. In the example above it consists of three parts:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

The red bit invokes the Google visualization API and specifies that the query results be returned as HTML (for display in a browser). The query language is similar to SQL, but if you use a prepared query for a given spreadsheet table structure, you don't need to learn any of that. Queries can be made which also return definitions, images, context examples or anything else that might reside in columns of interest in the online spreadsheet.

Using a tool like IntelliWebSearch or integrated extensions of OmegaT, memoQ and other tools, users working with any sort of tools can share a live glossary. Google Spreadsheets also have some permissions/security features which can be investigated if needed.

Of course other data can be shared this way, including TMs or XLIFF data as well as monolingual information. A little study of the relevant Google documentation reveals many possibilities :-)

Getting the picture with automated web searches

Like many other translators, I have come to appreciate the value and the complications of Internet searches in my work. As the garbage accumulated on the World Wide Web grows ever deeper, focused searches are more important than ever to get past the noise to find the information required, then get back to work.

Integrated tools for focused searches on multiple web sites are popular with many. IntelliWebSearch (IWS), memoQ Web Search and similar tools can be an enormous boost to productivity. But I doubt that many people give much thought to optimizing that possibility in general or for particular jobs.

Google searches are very popular. The Advanced Search features are particularly useful. For example, I find translating Austrian legal texts to be difficult sometimes, because an ordinary Google search of relevant legal terms yields too much interference from sites in other German-speaking countries. However, a search configured like this:

will yield only results in German from the Austrian site Jusline, which is very helpful if I am looking for the specific definition of "schwerer Betrug" in the jurisprudence of that country.

Similarly, a financial translator working with Austrian texts might use a search like

In my technical work, very often I must look for images of a component or process described. For a long time I did this inefficient: searched Google and then clicked the Images link and waded through the chaos to find what I needed. But if I am translating the catalog of the hunting supplier Frankonia, that's stupid. I can do a very specific search like this:

which will open a Google Images search directly (that's what the argument tbm=isch does), using only pictures culled from the site of the retailer whose material I am working on.

An image search using can often be very helpful to identify an unknown term and navigate to related articles in various languages. For example, a person encountering an unknown word in Russian might use this search:собака&

and quickly see what the term is about.

The search results above were obtained with memoQ Web Search, where I have the Wipikedia image search preconfigured:

Astute readers may notice the slight difference in syntax between the search in the screenshot and the Russian example I gave. There is more than one way to skin a cat with web searches. Or a dog in this case. To restrict searches to the wiki for one particular language just add the prefix for that subsite to the URL, for German, for example.

If you need to do such searches from many different applications under Windows, IntelliWebSearch might be a better choice for the preconfigured searches. I think it also handles a lot of tabs better, and it uses the ordinary browser setup instead of the more restricted options of memoQ's integrated mini-browser. I don't really like the fact that IWS keeps adding tabs to the browser, so I close it between searches, and to avoid messing up other work I am doing in Chrome (my default browser), I configure IWS to use another browser like Opera or Microsoft Edge.

Anyone who would like the light resource file for one of my German/English profiles for memoQ's web search can get it here. It includes the image search in Wikipedia and has a number of (mostly deactivated) custom search tabs useful for intellectual property translation. A few of the searches are for engines which require manual input of terms, but I find it convenient to have these on a tab for quick access.

Dec 26, 2016

The challenge of too many little files to translate

It seems to me that most translators face this challenge eventually: a customer has many small files of some kind - tiny web pages perhaps or other content snippets in XML, text or Microsoft Word files or perhaps even in some bizarre proprietary format - and wants them translated.

Imagine a dictionary project with thousands of words with their definitions, each "entry" being stored in a separate text file. How would you translate that efficiently?

The brute force method of opening and translating each file individually is not very satisfactory. Not only does this take a long time, but when I have tried foolishness like that I tend to overlook some files and spend far too much time checking to ensure that nothing has been overlooked. And QA measures like spellchecking? Let's change the subject....

Some translation tools offer the possibility to "glue" the content of the little files together and then (usually) "unglue" them later to reconstitute the original structure of little files, now translated.

Other tools offer various ways to combine content in "views" to allow translation, editing, searching and filtering in one big pseudofile. This is very convenient, and this is the method I use most often in my work with memoQ or SDL Trados Studio after learning its virtues earlier as a Déjà Vu user.

Unusual file formats can often be dealt with the same way after some filter tweaking or development. But sometimes....

... there are those projects from Hell where you have to ask yourself what the customer was smoking when he structured his data that way, because some other way would be so much more practical and convenient... for you. Ours is generally not to question why some apparently insane data structure was chosen but to deal with the problem as efficiently as possible within budget and charge appropriately for any extra effort incurred. Hourly fees for translation rather than piece rates certainly have a place here.

Sometimes there is a technical solution, though it may not be obvious to most people. For example, in the case presented to me by a colleague on Christmas Eve

the brief was to write the translation in language XX in the empty cell in that columnof the 3x2 table embedded in a DOCX file. There were hundreds of these files, each containing a single word to translate.

If these were Excel or delimited text files, a simple solution would have been to use the Multilingual Delimited Text Filter for memoQ and specify that the first row is a header. But that won't fly (yet) for MS Word files of any kind.

In the past when I have had challenging preparation to do in RTF or Microsoft Word formats - such as when only certain highlighted passages are to be translated and everything else is ignored - I have created macros in a Microsoft Office application to handle the job.

But this case was a little different. The others were always single files, or just a few files where individual processing was not inconvenient. And macro solutions often suffer from the difficulty that most mere mortals fear to install macros in Microsoft Word or Excel or simply have no idea how to do so.

So some kind of bulk external processing is called for. In this case, probably with a custom program of some kind.

I usually engineer such solutions with a simple scripting language - a dialect of the BASIC language which I learned some 45 years ago - using a free feature which is part of the Microsoft Windows operating system: Windows Scripting Host. And one-off, quick-and-dirty solutions with these tools do not require a lot of skill. The components of many solutions can be found on Microsoft Help pages or various internet forums with a little research if you have only a vague idea of what to do.

In this case, the tasks were to
  1. Select the files to process (all 272 of them)
  2. Open each file, copy the English word into the empty cell next to it
  3. Hide all the other text in the file so that it can be excluded from an import into a working tool like Déja Vu, memoQ or SDL Trados Studio (using the options for importing Microsoft Word files in this case; the defaults usually ignore hidden text on import)
After that the entire folder structure of files could be imported into most professional translation support environments and all 300 or so words to translate could be dealt with in a single list view.

A more detailed definition of the technical challenge would include the fact that to manipulate data in some way in a Microsoft Office file format, the object model for the relevant program would probably have to be used in programming (for XML-based formats there are other possibilities that some might prefer).

Microsoft kindly makes the object models of all its programs available, usually for free, and there is a lot of documentation and examples to support work with them. That may in fact be a problem: there is a lot of information available, and it is sometimes a challenge to filter it all intelligently.

In this case, I needed to use the Microsoft Word object model. It also conveniently provided the methods I needed to create the selection dialog for my executable script file. The method I knew from the past and wanted to use at first is only available to licensed developers, and I am not one of these any more.

It is easy to find examples of table manipulation and text alteration techniques in Microsoft Word using its object model in VBScript or some other Microsoft Basic dialect like Visual Basic for Applications (VBA). The casual dabbler in such matters might run into some trouble using these examples if there is no awareness of differences between these dialects; trouble is often found where VBA examples that declare variables by type (example: "Dim i as Integer") occur. Declarations in VBScript must be untyped (i.e. "Dim i"), so a few changes are needed.

In this case, the quick and simple solution (' documentary comments are delimited by apostrophes and marked green) to make the files import-ready was:

' We have a folder full of DOCX files, each containing
' a three-column table where COL1 ROW2 needs to be copied to COL2 ROW2
' and then the COL1 ROW2 and other content needs to be hidden.

Option Explicit

Dim fso
Dim objWord
Dim WshShell
Dim File
Dim objFile
Dim fileCounter
Dim wrd ' Word app object
Dim oFile  ' Word doc object
Dim oCell1  ' first cell of interest in the table
Dim oCell2  ' second cell of interest in the table
Dim oCellx1  ' other uninteresting text
Dim oCellx2  ' other uninteresting text
Dim oCellx3  ' other uninteresting text 
Dim oCellx4  ' other uninteresting text 

fileCounter = 0

'set the type of dialog box you want to use
'1 = Open
'2 = SaveAs
'3 = File Picker
'4 = Folder Picker
Const msoFileDialogOpen = 1

Set fso = CreateObject("Scripting.FileSystemObject")
Set objWord = CreateObject("Word.Application")
Set WshShell = CreateObject("WScript.Shell")

'use the path selected in the SelectFolder method
'set the dialog box to open at the desired folder

With objWord.FileDialog(msoFileDialogOpen)
   'set the window title to whatever you want
   .Title = "Select the files to process"
   .AllowMultiSelect = True
   'Get rid of any existing filters
   'Show only the desired file types
   .Filters.Add "All Files", "*.*"
   .Filters.Add "Word Files", "*.doc;*.docx"
   '-1 = Open the file
   ' 0 = Cancel the dialog box
   '-2 = Close the dialog box
   'If objWord.FileDialog(msoFileDialogOpen).Show = -1 Then  'long form
   If .Show = -1 Then  'short form
      'Set how you want the dialog window to appear
      'it doesn't appear to do anything so it's commented out for now
      '0 = Normal
      '1 = Maximize
      '2 = Minimize
      'objWord.WindowState = 2

      'the Word dialog must be a collection object
      'even with one file, one must use a For/Next loop
      '"File" returns a string containing the full path of the selected file
      For Each File in .SelectedItems  'short form
       'Change the Word dialog object to a file object for easier manipulation
        Set objFile = fso.GetFile(File)
Set wrd = GetObject(, "Word.Application") 
wrd.Visible = False 
wrd.Documents.Open objFile.Path 
Set oFile = wrd.ActiveDocument

Set oCell1 = oFile.Tables(1).Rows(2).Cells(1).Range  ' EN text
        oCell1.End = oCell1.End - 1
        Set oCell2 = oFile.Tables(1).Rows(2).Cells(2).Range  ' Target (XX)
        oCell2.End = oCell2.End - 1
        oCell2.FormattedText = oCell1.FormattedText  ' copies EN>XX 
oCell1.Font.Hidden = True ' hides the text in the source cell

' hide the other cell texts (nontranslatable) now
Set oCellx4 = oFile.Tables(1).Rows(2).Cells(3).Range
oCellx4.Font.Hidden = True
Set oCellx1 = oFile.Tables(1).Rows(1).Cells(1).Range
oCellx1.Font.Hidden = True
Set oCellx2 = oFile.Tables(1).Rows(1).Cells(2).Range
oCellx2.Font.Hidden = True
Set oCellx3 = oFile.Tables(1).Rows(1).Cells(3).Range
oCellx3.Font.Hidden = True

Set wrd = Nothing
fileCounter = fileCounter + 1
   End If
End With 

'Close Word

' saying goodbye
msgbox "Number of files processed was: " & fileCounter

The individual files look like the above screenshot (all text in the top row is hidden, so the entire row is invisible, including its bottom border line) after processing with the script, which is saved in a text file with a *.vbs extension (it can be launched under Windows by double-clicking):

Of course the script could be made much shorter by declaring fewer variables and structuring in a more efficient way, but this was a one-off thing where time was of the essence and I just needed to patch something together fast that worked. If this were a routine solution for a client I would be a bit more professional, lock the screen view, change to some sort of "wait cursor" during processing or show a progress bar in a dialog and all the other trimmings that one expects from professional software these days. But professional software development is a bit of a bore after so many decades, and I haven't got the patience to see the same old stupid mistakes and deceits practiced by yet another generation of technowannabe world rulers, I just want to solve problems like this so I can get back to my translations or go play with the dogs and feed the chickens.

But before I could do that I had to save my friend from the Hell of manually unhiding all that table text after his little translation was finished, so I put another 5 minutes (or less) of effort into the "unhiding" script:

Option Explicit

Dim fso
Dim objWord
Dim WshShell
Dim File
Dim objFile
Dim fileCounter
Dim wrd 
Dim oFile  
Dim oCell1  ' source text cell in the table
Dim oCellx1  ' other uninteresting text
Dim oCellx2  ' other uninteresting text
Dim oCellx3  ' other uninteresting text 
Dim oCellx4  ' other uninteresting text 

fileCounter = 0

Const msoFileDialogOpen = 1

Set fso = CreateObject("Scripting.FileSystemObject")
Set objWord = CreateObject("Word.Application")
Set WshShell = CreateObject("WScript.Shell")


With objWord.FileDialog(msoFileDialogOpen)
   .Title = "Select the files to process"
   .AllowMultiSelect = True
   .Filters.Add "All Files", "*.*"
   .Filters.Add "Word Files", "*.doc;*.docx"
   If .Show = -1 Then  
      For Each File in .SelectedItems
         Set objFile = fso.GetFile(File)
Set wrd = GetObject(, "Word.Application") 
wrd.Visible = False 
wrd.Documents.Open objFile.Path 
Set oFile = wrd.ActiveDocument
Set oCell1 = oFile.Tables(1).Rows(2).Cells(1).Range
oCell1.Font.Hidden = False 
Set oCellx4 = oFile.Tables(1).Rows(2).Cells(3).Range
oCellx4.Font.Hidden = False
Set oCellx1 = oFile.Tables(1).Rows(1).Cells(1).Range
oCellx1.Font.Hidden = False
Set oCellx2 = oFile.Tables(1).Rows(1).Cells(2).Range
oCellx2.Font.Hidden = False
Set oCellx3 = oFile.Tables(1).Rows(1).Cells(3).Range
oCellx3.Font.Hidden = False

Set wrd = Nothing
fileCounter = fileCounter + 1
   End If
End With 

msgbox "Number of files processed was: " & fileCounter

Dec 15, 2016

Validating Roman numerals in translation QA

The issue of Roman numerals in my translation work has been at the back of my mind for a few years now, but the pain level had not been such that I got around to dealing with it. It comes up time and again in legal translation work: references to the "X. Senat" or the like which mess up segmentation (and require a bit of regex to do a new segmentation rule); references to "Art. VII" of some law (I need to catch the typos like "VIII"); source text errors like "VIIII"; and of course dates like MCMXXIV, etc. and century references.

For simple matters I used regex which would capture and reproduce "Roman numerals", but erroneous data using the right letters would also be accepted:


That is, of course, rather useless for QA which checks the correctness of the expression in the source text. So with a bit of thought I came up with:

Without the word border syntax ("\b"), non-standard expressions like "VIIII" might appear to be validated in the interface of memoQ, for example, because the whole express would be marked green in the source text, and one might not notice that it was resolved into "VIII" and "I".

These expressions can be used in various ways in any CAT tool that supports regular expressions, such as SDL Trados Studio or memoQ.

If you want this typing aid and QA tool as a memoQ autotranslatable (along with a little demo data file), you can get it here.

Dec 13, 2016

The irregularities of regular expressions in #memoQ

Sometime back in the time-distant swamps where memoQ evolved, regex mysteriously became part of the software's virtual genes. It was unclear, exactly, which third-party engine or bacterial life form had been its source, and solution developers were often at a loss to know which advanced syntax would work or not unless they tried (and very often failed).

Many of us begged and pleaded for some kind of definitive documentation of allowed syntax for memoQ's regular expressions, which are an important feature for filtering (in recent versions), segmentation rules, special text import filters, autotranslatables rules and probably a few other things I've forgotten. But begging, threats - even bribery - led to no useful reference information, just some useless suggestions to read beginner's tutorials for other dialects somewhere on the Web.

Then, quite by accident, I learned yesterday that Kilgray uses the engine in Microsoft's .NET framework. Doh. Who'da thunk? Now, at last, I can get some definitive syntax information to help me solve more sophisticated problems for legal reference formats and other challenges in my translations with memoQ.

Even with accurate syntax guidance (at last!!!), regex development with memoQ is often not a simple matter. The integrated editors are often useless, especially for things like complex autotranslatables, where the bad feature of changing the order of rules after an edit can kill a ruleset. (It was long claimed by Kilgray Support that rule order does not matter, which is patently untrue. They simply did not look at the right test cases.)

Good code of any kind should usually be documented to facilitate maintenance. This is simply not possible with the editors for regex integrated in memoQ. So instead, I do all my rule-writing work in an external editor (such as Notepad++), where I can add extensive <!-- comments so I know what the heck I did when I have to revise the rules later --> and import the rulesets for testing into a memoQ project with appropriate test data included as "translation" documents. The hardest part of this workflow is remembering to enable the imported ruleset I want to test under Project home>Settings>Auto-translation rules; often I forget and think I really screwed up until I go back to the settings and mark the checkbox by the rules to test. Keep a lot of carb sources at your desk when you do regex work. Your brain will need them.

A lot of memoQ users think that regex is irrelevant to their working lives, but for hardcore financial and legal translators at least, this is an entirely mistaken idea. Correctly constructed rules can save much time and a lot of frayed nerves dealing with citations, dates, currency expressions and more, and the rules also decrease QA time while increasing accuracy.

I have quite a number of custom rulesets I have put together for my work and for some colleagues and clients. Regex is hard shit, no matter what anyone tells you. I have programmed computers in a host of languages since 1970 more or less and used to be known for a good memory for syntax rules, but I find regex so non-intuitive at anything more than a very basic level that if I use it only a few times a year, I have to re-learn it nearly every time. That's no fun. So the key to mastering regex is not to learn it. The massahs usually don't know sheet about workin' the fields, but if they are going to survive in this competitive world, they'll know which specialist to put on the job and reward him or her appropriately. Get to know a competent consulting specialist for memoQ regex, like colleague Marek Pawelec, and let that person's expertise save you many hours of typing and QA, not to mention undetected errors.

Kilgray also established a Professional Services department at last not long ago, and that team can also help you with these and other problems for optimizing the use of translation technologies. This is very often a better option than using consultants primarily focused on SDL solutions who do a bit of memoQ on the side, because even the best of these are often not really aware of the best approaches to use, and the consequences of this are sometimes dire. Are they at the memoQ wordface nearly every day, dealing with a wide range of challenges that push the technical envelope of the software to its limits? Or would they really rather do a beginner's workshop for SDL Trados Studio 2017 and show you all the cool features that memoQ has had for years and they probably never learned very well anyway? If it's not the first case, caveat emptor no matter the source.

Nov 20, 2016

Sweet Greek olives come to Portugal

The Good Doctor is widely travelled, and brings back to Portugal many interesting culinary ideas from around the world, using these to complement the traditions of her native land. So when I began to harvest olives from her trees to pickle for the coming year, she looked a little skeptically at the plastic water bottles full of crushed and slit olives and asked me Why don't you make sweet Greek olives?

I had never heard of those before, and she could not tell me much about them except that she had bought some in a shop while driving through Greece some years ago, and they were rather good, so she would prefer that I make some of those instead of the usual spiced pickles all the local farmers do. OK, I said, and began to look for information on the Internet. Nothing useful was found in searches using terms in English, German and Portuguese. I found some pages talking about candied olives made from pickled ones, but nothing useful describing the process starting with fresh olives.

What to do? I asked a Greek colleague for help, and a few minutes later, she sent me a link to a web page in Greek which describes making sweet olives and olive jam.

Since I can't do much more with Greek than sound the words out and search my brain for possible derivates in a language I know, it wasn't clear to me if I needed to work with any particular sort of olives, and I thought the suggested extraction time to remove the bitter elements from the raw olives was optimistic at best, so I took notes and prepared to "transcreate" the recipe for the olives I have available (based on my past experience picking them) and my own preferred approach to scaling recipes. Thus I arrived at the following recipe:

Azeitonas doces de Elvas
  1. Gather ripe, dark olives, de-stem and rinse them, then place them in clean one- to two-liter plastic bottles. Fill the bottles with fresh, cold water and cap them.
  2. Change the water daily for about two weeks, testing the bitterness of the olives until it is reduced to an acceptable level. The time needed will vary according to the olive variety, the degree of ripeness and your personal taste. The Greek recipe this one is based on suggests four or five days time with daily water changes, but that is simply too little time for my olives and my taste.
  3. After the olives are debittered, cut the tops of the plastic bottles to remove the olives. Then use a de-pitter (a descaroçador de cerejas - a cherry pitter - will do the job) to remove the pits from the olives.
  4. Weigh the olives and place them in a saucepan or small pot.
  5. Add the same weight of water to the pan (so for 600 g of de-pitted olives, add 600 ml water).
  6. Add sugar to the pot amounting to 40% of the weight of the olives (which would be 240 g sugar for 600 g olives).
  7. Bring to a hard boil on high heat, and let the mixture boil for 20 minutes, with occasional stirring. Then remove from heat and allow to rest overnight.
  8. The next day, add more sugar to the pot - 20% of the weight of the olives (so another 120 g of sugar if you are working with 600 g of de-pitted olives). 
  9. Boil the mixture hard for another 20 minutes until the syrup thickens. Then remove from heat.
  10. Can the sweet olives in sterilized jars following the usual hygenic procedures or serve them fresh, warm or cold.

Nov 12, 2016

Trump this!

The Lay of the Politics Waged by Donald

I am the Trump, o hear my cry!
I'll fight for you to love my Lie.
I'll build a Wall to bend you over,
then take my turn with Vlad and Rover.
Injecting Hope in your back end,
I'll screw you green, but I'm your Friend.

I won't pay tax: I'm not a chump,
no plebe like you, I am the Trump!
I make the jobs, you do the work,
you working slobs, and like a jerk,
I'll keep your pay, 'cause it's my perk.

A plastic wife like mine ain't cheap,
nor my pet dog, Slick Mike the veep,
and Master Vlad, he wants his cut,
I'll keep the cash, he'll take your butt.
Russian winters are so, so cold,
but rampant bears are hot and bold!

In politics I make my luck
by giving Vlad a timely suck
and spread his word so true and pure,
like finest vodka from manure!

I am the Trump! O hear my cry!
I have the codes: prepare to die!

Oct 21, 2016

A day in the life....

One of the things I enjoy most about professional translation is the range of activities and subject matters that one can encounter, even as a specialist in a few domains. I can't say the work is never boring, but when it does drift that way, very suddenly it isn't any more. Quite unpredictably.

Yesterday I typed translations. A bit more than expected after two sets of PowerPoint slides - a small one to translate from German and another to edit the rather acceptable English - turned out to have about 8,000 words of highly specialized slide notes about military command and control structures and the technology of fighting forest fires. (Note to self: no matter how busy you are, always import those presentations into memoQ with the options set to extract every kind of text as well as the bitmap graphics if you have to translate those too. Then do a word count! Appearances can be deceiving.)

Yesterday I dictated translations. The job started out as a bunch of text fragments from slides, where context über alles was the rule, lots of terminology required research, and voice recognition offered no particular advantages, then suddenly it became the translation of a rather long lecture using all that new terminology, and the deadline was tighter than thumbscrews operated by an angry ex-girlfriend. Dragon NaturallySpeaking to the rescue. Not only was this necessary to finish the text in a long workday rather than most of a week, but the more natural style of translation by dictation suited the purpose of the translated presentation particularly well. I could imagine myself in the room with equipment vendors, military commanders, firefighting specialists and freight forwarders, talking about the challenges faced and the technology required to avoid the tragedies of an out-of-control firestorm. And the words came out, transcribed from my voice directly into the target text fields of memoQ, exactly as they should be spoken to that audience. And at the end of that long day my hands still had feeling in them, which would not have been the case if I had typed even a third of the text.

Yesterday I made a specialized glossary to share with a presenter who will travel halfway around the world to lecture with the slides I translated for his talk. Long ago I discovered that the way I produce translations has the potential to provide additional benefits for those who will use my work. Sales representatives might need to write letters to their prospects, discussing their products in a language not mastered as a native, and the vocabulary from my work may help them to improve communication and avoid confusion that might result from using incorrect or simply different words to describe the same stuff. Or an attorney might need a quick overview of the language I used to translate the pleading she intends to file, to ensure that it is consistent with previous efforts and will not complicate discussions with her client. The terminology I research and record for each translation can be exported and reformatted quickly to produce glossaries or more complex dictionaries in a variety of formats suited for purpose. Little time and often a lot of benefits for my clients.

Yesterday I translated bitmap graphics and not only had to deal with the editing tools for that but also had to consider the best strategy for transforming the original German graphics into English ones. Would those charts be translated again into other languages? Would the graphics be re-used in other types of documents, so that I should consider ease of portability in my approach to the translation? And how the Hell do I actually use that new bitmap graphics transcription and substitution for Microsoft Office files which was added to memoQ some time ago and sort out the five charts to translate from the fifty to ignore? (Maybe I should blog the solutions some day.)

And yesterday I was asked to write summaries of large, badly scanned articles so that the equipment manufacturer would understand how its latest technology was discussed by German reviewers. As a kid I had a silly fantasy about getting paid to read, and this is just one of the many ways it unexpectedly came true. But before I get that far, these scanned files needed to be reworked so that they could be read and searched on the screen, so as I described in a guest post on another blog some years ago, I converted them to searchable PDF/A with ABBYY FineReader, which in this case also reduced their size by about 75%. The video below also shows how this works. Strangely, when I describe this procedure to other translators, many of them don't get it, and they go on about converting PDF files into editable MS Word files or plain text, or, God help them, something really stupid like importing PDF files directly into a CAT tool for translation, though none of this really relates to my purpose. Conversions often contain errors, and many texts are harder to interpret when the context of an accurate layout is lost. So "text-on-image" PDF files for translation reference to the original source files are often critical, and for files to summarize or consult sporadically for reference (with many pages to look at and essentially nothing to translate), a searchable PDF is the gold standard for efficient work.

In the course of that day I had to work with two computers linked by remote access using four networks at various time, working in German, English and Portuguese (the latter mostly involving questions to the housekeeper on how to do an online pizza delivery order so I could stay in the office and keep working). I used well over a dozen software applications for necessary tasks. These, and the environments in which they operate must be balanced carefully for efficient work. And even after some months in my new office, the balance isn't quite as good as I've had it before, and more attention to ergonomics is required.

Some colleagues are nostalgic for the "good old days" when they received a stack of paper to translate and sent off another stack of paper when the work was done, and they had a filing cabinet or a shelf of notebooks full of old work to use as reference material, and boxes of index cards stuffed full of scribbled notes on terminology next to seldom-dusty specialist dictionaries prepared by presumed experts, often full of marginalia commenting on errors or omissions and stuffed with papers bearing other scribbled notes. Not me. Since the day 30 years ago when I laboriously typed a text file full of file folder numbers and content descriptions for my research work and personal papers I have been a big believer in electronic retrieval of information wherever possible, and I miss retyping botched pages just as little as I miss the lines in the post office or the stress of dealing with delivery services.

I suspect that some feel a loss of control with the advent of new technologies in an old profession, and certainly the changes in the business environment for translation since the days of the typewriter often require a very different mentality to survive and thrive. What that mentality is, exactly, is a matter of healthy debate and often misunderstanding - again, because of the great diversity of the profession and the professions and unprofessionals in it.

The greatest challenges of new technologies that I find are the same as those faced in many other kinds of work and in modern life in general. Filtering the overabundance of input for the few things that are truly of use or interest and maintaining focus and calm amidst omnipresent distractions. Not relying too much on technologies that are far more fallible than most people, even experts, realize or acknowledge. And remembering that a fool with a tool, however many features and failsafes it may offer, remains a fool.

Oct 8, 2016

SDL Trados Roadshow in Lisbon on November 16th!

Next month on Wednesday, November 16th, the SDL roadshow featuring the latest release of SDL Trados Studio will be coming to Lisbon, Portugal. The all-day event is free of charge,but registration is required.

A full afternoon of training on the SDL Trados Studio translation environment is included in the day. Even if you live and work in a country other than Portugal, this is an excellent opportunity to be briefed on one of the leading technologies for efficient translation work and then take a very long weekend to enjoy Europe's capital of cuisine and culture.

See you there?

Oct 7, 2016

Time enough for words

We lack the words, you say,
to describe the journey
through the dark borderlands
at the end of our time,
as once the want of words
for the ordinary
light which fills half the sky
cast shadows on our lives.
But words were there, waiting,
for untrained ears to hear.
School us now in the sounds
of old life's dialect.

Aug 24, 2016

memoQ autotranslatables: a partial antidote for drudgery

I'm currently working on a stack of legal pleadings for a patent nullity suit – lots of "urgent" words to churn by the end of the week. And after 10,000 or so of them, I got pretty damned tired of typing out the translation of text citations of the form "Spalte 7, Zeilen 34 bis 45" as "Column 7, Lines 34 to 45".

In fact, it was really starting to piss me off. In such situations, I try not to get mad but to get an autotranslatable ruleset instead. This is perhaps one of the most under-utilized productivity tools in memoQ.

So the next time I ran into a text that fit that format, the translation was offered as an autocompletable phrase as soon as I typed the first letter:

Of course life isn't usually that simple, at least not life with technology. And authors? Well, they seem to believe firmly in the old saying that "consistency is the hobgoblin of little minds". So of course the text also includes lots of references in the form "Spalte 7, Zeilen 34 - 45", with or without spaces around the hyphen. No problem, just add a rule for that (or if you are more clever, edit the single rule to cover the variations):

Now I am not one to advocate that the unwashed masses of translators – or even the washed ones – run out and learn to write regular expressions. I've programmed more computer languages and systems than I can possibly remember for about 45 years now, and I can't keep most of the autotranslatable rules in my head if I don't use them for a week or more after yet-another-refresher, so it would be stupid and hypocritical of me (or just bloody naive) to expect most people to mess with nerdy shit like this. But....

... a few simple rules and a couple of nice "recipe templates" to start can go a long way. And sometimes it pays not to be too clever; I have one highly sophisticated set of rules for complex legal citations that was written by a professional programmer, and it's unusable. Takes minutes to load even on a very fast computer, which is a huge pain in the backside every time a project is opened in memoQ. My more verbose, brute force approach to legal reference autotranslation may not be elegant, but it loads much faster and covers 90% or more of what I encounter. Maybe a case of where it's smart to be a little stupid.

There are lots of good tutorials out there on regex (regular expressions), including a few YouTube webinar videos from Kilgray, the memoQ Help, a few chapters in old books of mine, discussions in the Yahoogroups lists and more.

The examples above require the knowledge of only a few rules:
  • Chunks of the source text to be analyzed are grouped in parentheses. In the examples shown, those groups are merely where numbers occur.
  • Numbers are represented by the escape code "\d". If there might be more than one digit, add a plus sign: \d+.
  • Spaces are represented by the escape code "\s". In the rules you can usually just type a space instead, but if you have to cover cases where it might be missing or where more than one might have been typed (usual sloppiness), then use the escape code, followed by an asterisk, which means "zero or more" of whatever it is put after: \s*.
  • For the rest of the text to match, you can usually type it just the way it occurs as I have done above. For the target translation rules, you can usually just type the literal text you want, with the groups represents by the numerical order in which they occur, preceded by a dollar sign. So the first group (parentheses set) in the source is $1, the second is $2, etc. Of course the order can be changed in the target; it's just not necessary in this case, but in autotranslatable rules for dates this happens rather often.
Not only will the little rules I wrote for this big job save me a lot of typing, I can also use them in a QA profile to check that I have made no errors by switching numbers, missing a space or anything else in my translation. That is done by marking the appropriate checkbox on the first tab of the QA profile you plan to use:

Perhaps such things are worth a little effort in your projects once in a while....

Aug 23, 2016

Reminder: web search tutorial this Friday!

Time is running out to register for Michael Farrel's webinar this Friday on the basics of IntelliWebSearch, a scripting tool that runs under Windows and enables multiple, simultaneous web searches using text selected in any application.

I used to be rather sceptical of this sort of tool, but in the past several years (since a similar, less powerful feature was introduced in memoQ) I have found this to be among the greatest contributors to me research and translation productivity. This saves time and reduces my work fatigue over the course of a long day.

The online workshop is free to IAPTI members and very affordable to everyone else (USD 25 or a bit less if you are a member of a partner association.

There will be a more advanced presentation to follow in September, which does not require participation in this one, but which does assume that you know the basics of IWS.

Aug 6, 2016

Approaching memSource Cloud

It has been interesting to see the behavior of my codornizes since I moved them from the confines of a rabbit hutch in a stall at my old quinta to the fenced, outdoor enclosures in the shade of a Quercus suber grove. In the hutch, they were fearful creatures,panicking each time I opened their prison to give water and food or to collect eggs. Their diet was also rather miserable; the German hunters who first introduced me to these birds for training very authoritatively told me that they ate "only wheat", and I felt bold to offer them anything different like cracked corn or rice. In the concentration camp-like conditions in which they lived, they also developed a serious case of mites and lost a lot of feathers. I thought about slaughtering and eating them as an act of mercy.

Then last spring I moved to a new place with a friend, who built a large enclosure for my goats and chickens. She didn't know about the quail. I brought them one day and hastily improvised an enclosure for them with a large circle of wire fence around a tree, because I was afraid the goats might trample them. There was far more space in this area than they had before, and real, dry dirt for taking dust baths. Soon the mite infestations improved (even before regular dunks in pyrethrin solution began), and the behavior of the birds began to change. They became less nervous, though sometimes when someone approached the enclosure they flew straight up in panic as quail sometimes do and bloodied themselves on the wire.

A few months later I built a much larger enclosure for a mother hen and her chicks to keep them out from under trampling feet or from wandering through the chain link fence of the enclosure into the hungry mouths of six dogs who watched the birds most of the day like Trump fans with a case of beer and an NFL game on the TV. The quail were moved in with the chickens as an afterthought. With nine square meters of sheltered space, the three little birds underwent further transformations, becoming much calmer, never flying in panic and allowing themselves to be approached and picked up with relative ease. They also exhibited a taste for quite a variety of foods, including fresh fruit and weeds such as purslane. Most astonishing of all, they began to lay eggs regularly in an overturned flower pot with a bit of dried grass. Nowhere else. All the reading I've done on quail on the Internet tells me that quail are stupid birds who drop their eggs anywhere, do not maintain nests and seem to have no maternal instincts whatsoever. I am beginning to doubt all that.

At various times in my life I have heard many statements made about the cultural proclivities of various ethnic minorities, but these assertions usually fail to take into account historical background and circumstances of poverty and prejudice, choosing instead to blame victims. In cases where I have seen people of this background offered the same opportunities I take for granted or far less than my cultural privilege has afforded me, I cannot see any result which would offer itself for objective negative commentary.

There are a lot of ignorant assumptions and assertions made about the class of digital sharecroppers known as translators. Some of the most offensive ones are heard from the linguistic equivalents of plantation owners, some of whom have long years of caring for these hapless, technophobic, unreliable "autistics" who simply could not survive without the patriarchal hand of their agencies.

Fortunately, technology continues to evolve in ways which make it ever easier to take up the White Man's Burden and extract value from these finicky, "artistic" human translation resources. The best of breed in this sense could make old King Leopold II envious with the civilization they have brought to us savage translators.

On many occasions, I have advocated the use of various server-based or shared online solutions for coordinating translation work with others. And I will continue to do so wherever that makes sense to me. However, I have observed a number of persistent, dangerous assumptions and practices which reduce or even eliminate the value to be obtained from this approach. It's not a matter of the platform per se, usually, unless it is Across to bear, but too often over the past decade, I have seen how the acquisition of a translation memory management server such as memoQ or memSource or a project management tool such as Plunet, OTM or home-rolled solutions has led to a serious deterioration in the business practices of an enterprise as they put their faith more in technology and less in the people who remain as cogs in their business engines.

As the emphasis has shifted more and more to technologies remote to the sharecroppers actually working the fields of words, a naive belief has established itself as the firm faith of many otherwise rational persons. This is expressed in many ways –  sometimes as a pronouncement that browser-based tools are truly the future of translation, often in the dubious, self-serving utterances of bottom-feeding brokers and tool vendors who proclaim the primacy of machine pseudo-translation while hiding behind the fig leaf argument that we need such things to master the mass of data now being generated. It is fortunate for them perhaps that this leaf is opaque enough to hide their true linguistic and intellectual potency from public view.

A related error which I see too often is the failure to distinguish between the convenience of process and project managers and the optimum environment for translating professionals. I don't think this mistake is malicious or deliberately ignores the real factors for optimal work as a wordworker; it's simply damned hard much of the time to understand the needs of someone in a different role. I could say the same for translators not understanding the needs of project managers or even translation consumers, and in fact I often do.

So indeed, the best tool for a project manager or a corporate process coordinator might not be the best tool for the results these people desire from their translators. Fortunately, this is usually a situation where, with a little understanding and testing, both sides can win and work with what works best for them. The mechanism to achieve this is often referred to by the nerdy term "interoperability".

Riccardo Schiaffino, an Italian translator and team leader based in the US, recently published a few articles (trouble and memoQ interoperability) about memSource, a cloud-based tool whose popularity among translation agencies and corporate or public entities with large translation needs continues to grow. High-octane translators like Riccardo and others have trouble sometimes understanding why these parties would choose a tool with such great technical limitations compared to some market leaders like SDL or memoQ, but the simplicity of getting started and the convenience of infrastructure managed elsewhere on secure, high-performance servers with sufficient capacity available for peak use is an understandably powerful draw.

And the support team of memSource and the tools developers are noted for their competence and responsiveness, which is equal in weight to a fat basket full of sexy technical options.

So I will not argue against the use of memSource by agencies and organizational users whose technical needs are not particularly complex and who do not have concerns about a tool almost entirely dependent on reliable, high bandwidth internet connectivity at all times to fulfill its key promises. In fact, it's a good and easy place to start for many, perhaps more so than the rival memoQ Cloud at present, which suffers sometimes from limited capacities (at the same data center used by memSource and others!) during peak use. Unlike the barbed-wire, unstable and unfriendly solution Across, which has achieved some popularity in its native Germany and elsewhere through sales tactics relying on fear, uncertainty and doubt regarding illusionary or delusional data security, memSource works, works well, and the data are portable elsewhere if a company or individual makes another choice some day.

But damn... it's just not very efficient for professional work, especially not for those of us who have amassed considerable personal work resources and become habituated to other tools like SDL Trados Studio, Déja Vu or memoQ like a carpenter is to his time- and work-tested favorite tools. Trading one of these for the memSource desktop editor or, God forbid, the browser-based translation interface feels worse than being forced to do carpentry with cheap Chinese tools cast from dodgy pot metal. Riccardo mentions a few of the disadvantages, and I could fill pages with a catalog of others. But compared to some other primitive tools, it's not so bad, and for those with little or no good experience with leading translation environment tools, it may seem perfectly OK. You don't miss a myriad of filtering options to edit text or sophisticated QA features if you are still amazed that a "translation memory" can spit out a sentence you translated once-upon-a-time if something similar shows up six months later.

And as mentioned, memSource - or some other tool - may indeed be the best solution on the project management side. So what's a professional translator to do if an interesting project is on offer but that platform is unavoidable? Riccardo's tips on how to process the MXLIFF files from memSource in memoQ offer part of a possible good solution which would work almost equally well in most other leading tools as well these days. One additional bit is needed in the memoQ Regex Tagger filter to handle the other tag type (dual curly brackets) in memSource, but otherwise the advice given will allow safe translation of the memSource files in other environments. I can even change the segmentation in memoQ if, as usual, the project manager has failed to create appropriate segmentation rules in memSource to accountfor some of the odd stuff one often sees in legal or financial texts, and this does not damage or change the segmentation seen later when the working file is returned to memSource.

Even concerns about the "lack" of access to shared online resources in memSource if an MXLIFF is translated elsewhere are easily addressed. A few useful things for this include:

  • pretranslation of the memSource files to put matches into the target before transferring to other environments,
  • leaving the browser-based or desktop editor for memSource open in the background for online term base or TM look-ups, and
  • occasionally exporting and synchronizing the MXLIFF in memSource to make the data available to team members working in parallel on a large project - this takes just a minute or two and allows one as much time as needed for polishing text in the other environment.

The last tip is particularly helpful to calm the nerves of project managers who are like mother hens on a nest of eggs which they fear might in fact be hand grenades and who panic if they don't see "progress" on their project servers days before anything is due. One can show them "progress" every twenty minutes or so without much ado if so inclined.

I am past the point where I recommend any translation memory management server in particular for agency and corporate processes. There are advantages to each (except Across, where these are actually hallucinations) and disadvantages, and where I see real problems, it is seldom due to the choice of platform but rather the lack of training and process knowledge by those responsible for the processes. The bright and shining prospects of a translation server are easily sold with a slick tongue, but without an honest analysis and recommendation of needs for initial and ongoing staff training these too often end up being bright and shining lies. I think very often of a favorite German customer who invested heavily in such a system four or five year ago and has not managed one single successful project with the system in all that time. This makes me sick to think of the waste of resources and possibilities.

So on the project management and process ownership side, memSource may be a great choice. Certainly some of my clients think so, and the improvements in their business often back this belief up. And for those who work with gangs of indigent, migrant or sharecropping translators whose marginal existences make the investment in professional resources like SDL Trados Studio or memoQ seem difficult or undesirable, it may be all that is needed by anyone.

The good news for those who depend on the efficiency of a favored tool, however, is that with a few simple steps, we need not compromise and can get full value from our better desktop tools while supporting interesting projects based in memSource. So each side of the translation project can work with what works best for them, without loss, compromise, risk or recriminations.

And the translating quail who start out in a dark box with a stunting lack of possibilities can look forward to the real possibilities of work liberation in a larger environment richer in healthy possibilities and rewards.