Pages

Aug 22, 2017

A Mox on all their houses!


Alejandro Moreno-Ramos is a translating electromechanical engineer working from French and English to European Spanish who for years has captured the essence of translators' tribulations with his Mox cartoons, which are available online at the Mox blog and from Amazon in two hardcopy collections, Mox: Illustrated Guide to Freelance Translation and Mox II: What they don't tell you about translation.

The blog went quiet about two year ago, but recently Alejandro has begun to publish his cartoons again to the great delight of his fans. It takes one from the trenches to represent the profession with an honesty and clarity you'll never see from the Common (Non-)Sense Advisory and the rest of the bog. Check out Mox's blog and enjoy!

Aug 21, 2017

Translating Chicago Style with Carol Saller (webinar)

Twenty Years behind QA for The Chicago Manual of Style 

Carol Saller is best known in her role as an editor at the University of Chicago Press, where she was head copyeditor for the 16th edition of The Chicago Manual of Style, and as the author of her often hilarious and enormously helpful book, The Subversive Copyeditor. The book is an eminently cogent response to the thousands of questions that Ms. Saller reads each year from writers and copyeditors in her role as Editor for the Q&A page of The Chicago Manual of Style Online. Many of these writers and editors have reached a stand-off with each other over prickly and sometimes humorous questions of grammar and style. To wit, “My author wants his preface to come at the end of the book. This just seems ridiculous to me. I mean, it’s not a post-face.”

Carol Saller surprises a lot of hardline editors by stressing flexibility when it comes to supposedly hard and fast “rules.” The focus, she seems to feel, should be on clarity for the reader and on a good and useful working relationship between writers and editors… As well as translators and editors!

September 9, 2017 at 4 PM UTC  This webinar will be held in English

Speaker: Carol Saller


REGISTRATION:  info.request@iapti.org

IAPTI members: FREE!
Partner associations: USD 22.00
Non-members: USD 25.00


Aug 14, 2017

Oxford living dictionaries for "other" languages

I had some difficulties decided how to title this post given the historically loaded connotations of possible alternatives. The Oxford Dictionaries project does a lot of useful stuff, offering quite a number of monolingual and bilingual dictionaries free and by subscription, which are of great value to editors and translators.

I am particularly excited and encouraged to see bilingual and monolingual resources from Oxford for some common African languages now, such as Setswana, Swahili, Northern Sotho and isiZulu. In recent years it has been a great blessing to meet some African colleagues from Egypt, Nigeria, Kenya, Angola and elsewhere at IAPTI events, memoQfest or other venues. In some of my education support efforts through IAPTI I have found rather interesting resources in South Africa and a few other places, but on the whole it appears to me as an outsider that colleagues there face a relative shortage of resources for any work they might do with local languages not transplanted from Europe. So it is a great pleasure for me personally to discover and share such resources (and I would encourage others to do so as well in the comments below).

The Oxford global languages also features other important languages such as Indonesian, Malay and various Indian languages like Hindi, Gujarati, Tamil and Urdu. And then there are the usual suspects like English and Spanish.

I fell in love with the Oxford English Dictionary as a child, when I found the long shelf filled with its volumes of historical etymology. The dictionaries mentioned and linked here are focused more on current usage of living languages, but they should have much of the same scholarship and rigor that goes into the making of that marvelous OED. Enjoy.

Aug 11, 2017

The memoQ Web Search memory leak fix! (updated again)

A big thank you to Italian veterinary surgeon and translating colleague Claudio Porcellana, who solved the mystery of the memory leak which has plagued users of memoQ's Web Search for years now. While Kilgray developers busily work on alternative engines for fixing future versions, Dr. Porcellana used his head - as impatient Southern Europeans are wont to do.

The problem it seems is with troublesome Java applets on sites like Linguee. So he simply turned them off. And plugged the leak.

Kilgray currently uses an Internet Explorer component for memoQ Web Search, so here's the fix:
  1. Start Internet Explorer and open Internet options in the Settings:


     
  2. Go to the Security tab and click the Custom level button:


     
  3. Then find the Scripting section and disable the Java applets:


Leave Active scripting (= JavaScript, etc.) enabled or you will mess up the search for some sites like LEO.

After I made this change, I tested memoQ Web Search. Instead of the usual steady increase in memory consumption I used to observe due to the infamous leak, everything remained rock stable, and all my site searches that I typically use for legal and scientific translation worked just fine.

This fix ought to work with all versions of memoQ since the introduction of the web search feature (in memoQ 2013 R2 I think it was). So thank you, Dr. Porcellana, for making our working lives a little less crash-prone!

UPDATE: Further testing has revealed (as noted in some comments below) that there is more to the story. I was puzzled that some people continued to experience the memory leak unless "active scripting" was active, and at Varga's request I tested again on my system (I was sure up until then that his troubles might be tied to a Hungarian system, but it turns out that is in fact not the case). To9 my astonishment, the problem re-appeared after it had been eliminated before after disabling the Java applet scripting alone. I had to turn off "active scripting" too to achieve stability. And then suddenly the problem went away again.

Puzzling, right? And annoying of course. And then an idea occurred to me, and I dug up my Linguee user account password and logged in to Linguee under my user name. I contribute a lot of terms when I search in other browsers so I have a lot of credit, and this credit is applied as searches without ads.

It's the advertising. Some ads seem to involve Java applets. Other ads do buggy things with scripts that do not use applets. And some ads do neither of these two things and cause no trouble.

Maybe an ad blocker applied to Internet Explorer will fix the problem for memoQ Web search until the changeover to Chromium occurs in the next version. [No, it does not, alas.] In the meantime, I will achieve stability for today's big job by staying logged in to my Linguee account!

YET ANOTHER UPDATE: As advertisements and the like have been identified as the real source of trouble, one user suggested substituting the Windows hosts file. This approach has a number of advantages apparently; it presumably de-craps your Internet connection by blocking sites that send troublesome content, communicate with spyware, etc. A better hosts file with instructions for where to put it is found at: http://someonewhocares.org/hosts/

Substituted hosts file on my Windows 10 system; the old file was backed up by re-naming it.



Aug 3, 2017

"Coming to Terms" workshop materials for terminology mining



I recently put together a two-hour online workshop to teach some practical aspects of terminology mining and the creation and management of stopword lists to filter out unwanted word "noise" and get to interesting specialist terminology faster.

A recording of the talk as well as the slides and a folder of diverse resources usable with a variety of tools are available at this short URL: https://goo.gl/qvwJbf. The TVS recording file can be opened and played by the free TeamViewer application.

The discussion focuses primarily on Laurence Anthony's AntConc and the terminology extraction module of Kilgray's memoQ.

Jul 26, 2017

Shortcuts to managing bitext corpora and terminologies in free Google Sheets

When I presented various options for using spreadsheets available in the free Google Office tools suite on one's Google Drive, I was asked if there wasn't a "simpler" way to do all this.

What's simple? The answer to that depends a lot on the individual. Yes, great simplicity is possible with using the application programming interface for parameterized URL searches described in my earlier articles on this topic:
The answer is yes. However, there will be some restrictions to accept regarding your data formats and what you can do with them. If that is acceptable, keep reading and you'll find some useful "cookie cutter" options.

When I wrote the aforementioned articles, I assumed that readers unable to cope with creating their own queries would simply ask a nerdy friend for five minutes of help. But another option would be to used canned queries which match defined structures of the spreadsheet.

Let's consider the simplest cases. For anything more complicated, post questions in the comments. One can build very complex queries for a very complex glossary spreadsheet, but if that's where your at, this and other guns are for hire, no checks accepted.

You have bilingual data in Language A and Language B. These can be any two languages, even the same "language" with some twist (like a glossary of a modern standard English with 19th century thieves' cant from London). The data can be a glossary of terms, a translation memory or other bitext corpus, or even a monolingual lexicon (of special terms and their definitions or other relevant information. The fundamental requirement is that these data are placed in an online spreadsheet, which can be created online or uploaded from your local computer and that Language A be found in Column A of the spreadsheet and Language B (or the definition in a monolingual lexicon) in Column B of the spreadsheet. And to make things a little more interesting we'll designate Column C as the place for additional information.


Now let's make a list of basic queries:
  1. Search for the text you want in Column A, return matches for A as well as information in Column B and possibly C too in a table in that order
  2. Search for the text you want in Column B, return matches for B as well as information in Column A and possibly C too in a table in that order
  3. Search for the text you want in Column A or Column B, return matches for A/B and possibly C too in a table in that order

Query 1: searching in Column A

The basic query could be: SELECT A, B WHERE A CONTAINS '<some text>'
Of course <some text> is substituted by the actual text to look for enclosed in the single straight quote marks. If you are configuring a web search program like IntelliWebSearch or the memoQ Web Search tool or equivalents in SDL Trados Studio, OmegaT or other tools, the placeholder goes here.

If you want the information in the supplemental (Comment) Column C, add it to the SELECT statement: SELECT A, B, C WHERE CONTAINS '<some text>'

The results table is returned in the order than the columns are named in the SELECT statement; to change the display order, change the sequence of the column labels A, B and C in the SELECT, for example:  SELECT BA, C WHERE CONTAINS '<some text>'

Query 2: searching in Column B

Yes, you guessed it: just change the column named after WHERE. So 
SELECT BA, C WHERE B CONTAINS '<some text>
for example.

Query 3: searching in Column A or Column B (bidirectional search)

For this, each comparison after the WHERE should be grouped in parentheses: 
SELECT A, B, C WHERE (A CONTAINS '<some text>') OR (B CONTAINS '<some text>')

The statement above will return results where the expression is found in either Column A or Column B. Other logic is possible: substituting AND for the logical OR in the WHERE clause returns a results table in which the expression must be present in both columns of a given record.

And yes, in memoQ Web Search or a similar tool you would use the placeholder for the expression twice. Really.

Putting it all together

To make the search URL for your Google spreadsheet three parts are needed:

  1. The base URL of the spreadsheet (look in your browser's address bar; in the address https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/edit#gid=1106428424 for example, the base URL is everything before /edit#gid=1106428424.
  2. The string /gviz/tq?tqx=out:html&tq= and
  3. Your query statement created as described above
Just concatenate all three elements:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

An example of this in a memoQ Web Search configuration might be:

https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/gviz/tq?tqx=out:html&tq=SELECT B, A WHERE (A CONTAINS '{}') OR (B CONTAINS '{}')

and here you can see a search with that configuration and the characters 'muni' :  https://goo.gl/D5cQmh


Adding custom labels to the results table

If you clicked the short URL given as an example above, you'll notice that the columns are unlabeled. Try this short URL to see the same search with labels: https://goo.gl/3zJQqK

This is accomplished simply by adding LABEL A 'Portuguese', B 'English' to the end of the query string.

If you look at the URL in the address bar for any of the live web examples you'll notice that space characters, quote marks and other stuff are substituted by codes. No matter. You can type in clear text and use what you type; modern browsers can deal with stuff that is ungeeked too.

To do more formatting tricks, RTFM! It's here.





Jul 20, 2017

memoQ Web Search examples for Portuguese

This week I'm in Lisbon teaching a 24-hour Boas Practicas (best practice) evening course for translation technology with David Hardisty and Marco Neves. Tonight we're covering web search with various sites and tools, including memoQ Web Search.

Unfortunately, Kilgray provides examples of configuring the web search only for English and German, and many of the site configurations are defective. And if you have other languages as your working pairs there isn't much you can do with those examples.

In tonight's class we had students working in the following pairs:
  • Portuguese to English
  • English to Portuguese
  • Portuguese to Russian
  • French to Portuguese
  • Spanish to Portuguese
  • German to Portuguese
So we created some example configurations to do web look-ups in all these pairs. And they are available here.

I was a bit surprised to find that I never blogged the chapters of my books that dealt with configuring the web search - I'll have to get around to that one of these days - but the memoQ Help isn't bad for this if you need a little guidance on how to add more site searches or change the configurations of these.

Anyone is welcome to do with the configurations provided here as they please; I hope they will help friends, colleagues and students in the Lusophone world to go a little farther with a great tool.




Jul 3, 2017

Something new out of Africa!

Guest contribution by Obi Udeariri
Photographs provided by Sameh Ragab/EAITA

Many years ago, Pliny the Elder declaimed Ex Africa semper aliquid novi  – "(There's) always something new (coming) out of Africa". He was referring to the continent’s diverse natural resources, but that phrase has come true yet again, because something new has again come out from Africa with respect to its diverse human resources, Homo Africanus interpres.

Nairobi is the capital of Kenya and the jewel of East Africa; the stomping ground of the famed Kenyan writers Ngũgĩ wa Thiong'o and Grace Ogot and the Nobel laureate Wangari Muta Maathai. With its temperate climate and lush wildlife, it’s a favorite holiday destination for hundreds of thousands of tourists each year, who come to enjoy its excellent hospitality and numerous attractions. It’s also home to the African headquarters of the United Nations and another emerging international organization – the East African Interpreters and Translators Association.

The EAITA was formed barely a year ago, with a membership comprising language professionals from across East Africa, and in its brief life it’s already held two major events aimed at boosting professional competence, featuring outstanding keynote speakers from abroad. This year’s event was held on Saturday 1st July, was focused on the use of CAT tools to promote productivity, and was deftly and professionally handled by Sameh Ragab, a vastly experienced translation professional, CAT tools trainer, and certified United Nations Vendor, who graciously gave his audience the benefit of this extensive experience at no cost.

Technology guru Sameh Rageb of Egypt - a favorite teacher at conferences around the world!
The uptake and use of CAT tools and other cutting edge techniques and the interest in doing so is widespread. This was shown by the mini-summit nature of the event whose attendees came from all across East Africa, from Kenya itself, Rwanda, Burundi and Tanzania and from as far afield as the lush and steamy tropical nation of Nigeria. An accentologist would have had a field day.


The immense expansion of language services occasioned by new communication methods and technology has definitely not passed Africa by, contrary to what some may think. African countries have largely overcome their infrastructural issues, and language professionals are busy tapping away, chuchotant in interpreting booths, leveraging latest software for transcription, project management and other needs and are doing all this in real-time, backed up by IT infrastructure to match the best in other countries.

Translation and interpreting have always been a part of life in African countries. Given the continent's ethnically heterogeneous communities and countries, there has always been a need to convey meaning in written or oral form between its peoples, and the average language professional here (who is usually already natively bilingual in one or more of its lingua francas or native languages) is simply taking this inbuilt familiarity with language manipulation to the next level.

In view of the nearly full turnout of EAITA members and the interest generated by this event, international language service providers would do well to screw their monocles firmly in place and divert some of their flighty attention towards the continent’s language professionals. Not as a source of cheap labor, but rather in search of skilled, competent, thoroughbred professionals whose skills and expertise are on a par with anything obtainable worldwide, and whose diverse peoples speak, read, write, translate and interpret an equally diverse range of languages with proficiency including lingua francas such as Swahili, English, Arabic, French, Spanish, Hausa, Igbo and many, many more.

Congratulations to the EAITA for the successful event, which was also supported by the International Association of Professional Translators and Interpreters; I’m looking forward to more new, good things coming out of Africa!

Focused on the future.

*******

Obi Udeariry is a specialized legal translator who translates all kinds of legal documents from French, German and Dutch to English. He has a law degree and several translation certifications and has been a full-time freelance translator for 14 years. 

He is the Head of the Nigerian chapter of the International Association of Professional Translators and Interpreters (IAPTI), and lives in Lagos, Nigeria with his wife and two sons.




Jun 25, 2017

NOW is not the National Organization of Words...

... but with over 4 billion of them, that interpretation of the News on the Web corpus at Brigham Young University would be plausible. BYU is known for its high quality research corpora available to the public. The news corpus grows by about 10,000 articles each day, and its content can be searched online or downloaded.

The results are displayed in a highlighted keyword in context (KWIC) hit list with the source publications indicated in the "CONTEXT" column:


As a legal translator, I find the BYU corpus of US Supreme Court Opinions more useful. It displays results in a similar manner:


It is difficult or impossible to configure a direct search in these corpora using memoQ Web Search, IntelliWebSearch or similar integrated web search features in translation environments. However, these tools can be used as a shortcut to open the URL, and the search string can be applied once the site has been accessed. Since I perform searches like this to study context infrequently, a standalone shortcut with IWS serves me best; if I were using this to study usage in a language I don't master very well, like Portuguese (yes there is a Portuguese corpus at BYU - actually, two of them, one historical), then I might include the URL in a set of sites which open every time I invoke memoQ Web Search or a larger set of terminology-related sites in an IntelliWebSearch group.

One great benefit of using such corpora as a language learner, is that context and collocations (words that occur together with a particular word or phrase) can be studied easily, better than with dictionaries, enabling one to sound a bit less like an idiot in a second, third, fourth or fifth language. Or for many perhaps, even their first language :-)

Jun 24, 2017

The multilingual toolkit for getting a date in Swahili


Some time ago, I was asked by IAPTI to provide some technical support for a developing effort to assist professional translators in various African regions. The flame of the Translators Without Borders center established a few years ago in Kenya has apparently sputtered out due to an incredibly silly anti-business model which undermined local professionals, so various initiatives were launched to help translators in the region grow stronger together and improve their professional practice.

Since memoQ is perhaps the best tool for managing the challenges of expert translation under the widest range of languages and conditions, I considered how I might contribute to solving some of these and reduce the frustrations of language barriers in Africa. I thought of all the business travelers there, as well as the NGOs and representatives of governments around the world who want a piece of what's there. All alone, strangers in a strange land, sweltering in some Nairobi hotel, how can these people even get a date in Swahili?

Once again, it's Kilgray to the rescue... with memoQ's auto-translation rules!

Using the various methods I have developed and published for planning and specifying auto-translation rules, I assembled an expert team for translation in Swahili, Arabic, Hebrew, English, German, Portuguese, Spanish, French, Russian, Hungarian, Dutch, Finnish, Polish and Greek to draft the rules for getting long dates in Swahili.

And using the Cretinously Uncomplicated Process for Identifying Dates (CUPID), these results can be transmogrified quickly to support lonely translators working from German, French and English into Arabic or from German, French, English and Spanish into Portuguese, for example, or in any combination of the languages applied for Swahili dates or others as needed.

With memoQ and regex-based auto-translation, you'll never be stuck for a quality-controlled date in any language!

Germany needs Porsches! And Microsoft has the Final Solution....


I hear that Germany is suffering from a shortage of Porsches. Odd, given that the cars are made there and should be readily available, but it's true, because my friend who lives there told me. He owns a large, successful LSP (Linguistic Sausage Production) company, and to celebrate its rise in revenues, he decided to get everyone on the sales staff a new Porsche as a company car. The problem is that he can't find any for €5000 euros.

So he was left with no choice but to cut overhead using the latest technologies. Microsoft to the rescue! With Microsoft Dictate, his crew of  intern sausage technologists now speak customer texts into high-quality microphones attached to their Windows 10 service stations, and these are translated instantly into sixty target languages. As part of the company's ISO 9001-certified process, the translated texts are then sent for review to experts who actually speak and perhaps even read the respective languages before the final, perfected result is returned to the customer. This Linguistic Inspection and Accurate Revision process is what distinguishes the value delivered by Globelinguatrans GmbHaha from the TEPid offerings of freelance "translators" who won't get with the program.

But his true process engineering genius is revealed in Stage Two: the Final Acquisition and Revision Technology Solution. There the fallible human element has been eliminated for tighter quality control: texts are extracted automatically from the attached documents in client e-mails or transferred by wireless network from the Automated Scanning Service department, where they are then read aloud by the latest text-to-speech solutions, captured by microphone and then rendered in the desired target language. Where customers require multiple languages, a circle of microphones is placed around the speaker, with each microphone attached to an independent, dedicated processing computer for the target language. Eliminating the error-prone human speakers prevents contamination of the text by ums, ahs and unedited interruptions by mobile phone calls from friends and lovers, so the downstream review processes are no longer needed and the text can be transferred electronically to the payment portal, with customer notification ensuing automatically via data extracted from the original e-mail.

Major buyers at leading corporations have expressed excitement over this innovative, 24/7 solution for globalized business and its potential for cost savings and quality improvements, and there are predictions that further applications of the Goldberg Principle will continue to disrupt and advance critical communications processes worldwide.

Articles have appeared in The Guardian, The Huffington Post, The Wall Street Journal, Forbes and other media extolling the potential and benefits of the LIAR process and FARTS. And the best part? With all that free publicity, my friend no longer needs his sales staff, so they are being laid off and he has upgraded his purchase plans to a Maserati.



The other sides of Iceni in Translation


The integration of the online TransPDF service from Iceni in memoQ 8.1 has raised the profile of an interesting company whose product, the Infix PDF Editorhas been reviewed before on this blog. TransPDF is a free service which extracts text content from PDF files, converts it to XLIFF for translation in common translation environments, and then re-integrates the target text from the translated XLIFF to create a PDF file in the target language.

This is a nice thing, though its applicability to my personal work is rather limited, as not many of my clients would be enthusiastic if I were to send PDF files as my translation results. Sometimes that fits, sometimes not. And of course, some have raised the question of whether using this online service is compatible with some non-disclosure restrictions.

I think it's a good thing that Kilgray has provided this integration, and I hope others follow suit, but for the cases where TransPDF doesn't meet the requirements of the job, it is useful to remember Iceni's other options for preparing text for translation.

Translatable XML or marked-up text export
As long as I can remember, the Infix PDF Editor has offered the option to export text on your local computer (avoiding potential non-disclosure agreement violations) so that it can be translated and then re-imported later to make a PDF in the target language. Only the location of this option in the menus has changed: the menu choices for the current version 7 are shown below.



This solution suffers from the same problem as the TransPDF service: not everyone will be happy with the translation in PDF, as this complicates editing a little. However, I find the XML extract very useful to put the content of PDF files into a LiveDocs corpus for reference or term extraction. The fact that Infix also ignores password protection on PDFs is also helpful sometimes.

"Article" export
The Article Tool of  the Iceni Infix PDF Editor enables various text blocks on different pages of a PDF file to be marked, linked and extracted in various translatable formats such as RTF or HTML. The quality of the results varies according to the format.


Once "articles" are defined, they are exported via the command in the File menu:


The RTF export has some problems, as this view in Microsoft Word with the format characters made visible reveals:


However, the Simple HTML export opened in Microsoft Word shows no such troubles (and can be saved in RTF, DOCX or other formats):


Use of the article export feature requires a license for the Infix PDF editor, unlike the XML or marked-up text exports for translation. In demo mode, random characters are replaced by an "X" so that one can see how the function works but not receive any unjust enrichment from it. However, this feature has significant value for the work of translators and is well worth an investment, as the results are typically better than using OCR software on a "live" (text-accessible) PDF file.

But wait... there's more!
Version 7 also has an OCR feature:


I tested it briefly on some scanned Portuguese Help Wanted ads that I'll probably use for a corpus linguistics lesson this summer; the results didn't look too awful all considered. This feature is worth a closer look as time permits, though it is unlikely to replace ABBYY FineReader as my tool of choice for "dead" PDFs.

Jun 23, 2017

Terminology output management with SDL MultiTerm


I have always liked SDL MultiTerm Desktop - since long before it was an SDL product, back when it came as part of the package with my Trados Workbench version 3 license.

Then, as now, Trados sucked as a working tool, so I soon switched to Atril's Déja Vu for my translation work, and after 8 or 9 years to memoQ, but MultiTerm has continued to be an important working tool for my language service business. I extract and manage my terminology with memoQ for the most part, but when I want a high-quality format for sharing terminology with my clients' various departments, there is currently no reasonable alternative to MultiTerm for producing good dictionary-style output.

Terminology can be exported from whatever working environment you maintain it in, and then transferred to a MultiTerm termbase using MultiTerm Convert or other tools. In the case of memoQ, there is an option to output terms directly to "MultiTerm XML" format:


Fairly simple; there are no options to configure. Just select the radio button for the MultiTerm export format at the top of any memoQ term export dialog. And what do you get?


Three files: the XML file with the actual term data and the XDT file with the termbase specifications are the important ones. The latter is used to create the termbase in SDL MultiTerm. If you have an existing termbase to use in MultiTerm, you won't need the XDT file, though if that termbase is not based on Kilgray's XDT file there might be some mapping complications for the term inport from the XML file.

Now let's create a termbase in SDL MultiTerm 2017 Desktop:


Give it a name:


When the termbase wizard starts, choose the option to load an existing termbase definition and select the XDT file created by memoQ:



At the end of the process you will have an empty Multiterm termbase into which the data in the XML file are imported:



Now you'll have an SDL Multiterm termbase with the glossary content exported from memoQ. This is a process which can be carried out when sharing terminology with a colleague who uses SDL Trados Studio for translation, for example. If they don't know how to use the import functions of SDL Multiterm or you want to save them the bother of doing so, just share the SDLTB file.



Now that the glossary is in Multiterm it can be exported in various formats which can be helpful to people who prefer the data in a more generally accessible format. Please note that this is not done using the export functions under the File menu! SDL Multiterm is a program originally developed by German programmers, who have their own Konzept of Benutzerfreundlichkeit. Even in the hands of Romanian developers, it's still kinda weird. The desired functions are found in the Termbase Management area of course:


In keeping with the German Benutzerfreundlichkeitskonzept, the command to generate the desired output is Process, of course.

There are a number of pre-defined output templates included with Multiterm. I usually use a version of the "Word Dictionary" export definition, which produces a two-column RTF file, which by default will give output like this:



I prefer something a little different, so I have prepared various improved versions of this output definition, and I usually edit the text, adjust the column breaks as needed and clean up any garbage (like redundant initial letters caused by inflected vowels in a language like Portuguese), then I slap a cover page on the file and make a PDF out of it or create a nice printed copy, possibly with other page size formatting. Here is an example:

Example PDF dictionary output - click to enlarge

Other possible output formats include HTML, which can be useful for term access on an intranet, for example. Custom definitions can be created by cloning and editing an existing definition; these are specific to a given termbase. If you want to apply a custom export definition to another termbase, export it as an XDX file and then load it for the other termbase. The definition file used to generate the example above is available here.

One essential weakness of the SDL export definition which has always annoyed me is the failure to include the last word on the page in the header as most proper dictionaries do. I addressed this in the definition with my limited knowledge of RTF coding, but the change can be made manually in Microsoft Word too, for example, by copying and pasting the SortTerm field and editing it to add the \l argument:


There are, of course other, possibly better ways to get some nice output formats from memoQ glossaries or termbases in other tools. One approach with memoQ is to create XSL scripts to process the MultiTerm XML output from memoQ. For years I have been hoping that Kilgray would create a simple extension to the term export dialog in memoQ, which would allow XSL scripts to be chosen and a transformation applied when the data are exported. It really is a shame that after more than a decade the best translation environment tool available - memoQ - still cannot match the excellent formatted output that my clients and I have enjoyed with MultiTerm since I first started using that program 17 years ago!



Jun 22, 2017

Translation alignment: the memoQ advantage

The basic features for aligning translated texts in memoQ are straightforward and can be learned easily from Kilgray documentation such as the guides or memoQ Help. However, there are three aspects of alignment in memoQ which I think are worth particular attention and which distinguish it in important ways from alignments performed with other translation environment tools or aligners.

Aligning the content of two folders with source and target documents; automatic pairing by name
The first is memoQ’s ability to determine alignment pairs automatically based on the similarity of names. This has the advantage, for example, that large numbers of files can be aligned automatically, with the source and target documents matched based upon the filenames. This can be done with individual files or with entire folders with perhaps hundreds of files. Thus if source files are contained in one folder and the translated files in the target language are in a different folder and the source and target file names are similar, the alignment process for a great number of files can be set up and run in a matter of minutes. Note in the example screenshot above that different file types may be aligned with each other.

The second important difference with alignment in memoQ is that it is really not necessary to feed the aligned content to a translation memory. memoQ LiveDocs alignments essentially function as a translation memory in the LiveDocs corpus, with one important difference: by right-clicking matches in the translation results pane or a concordance hit list, the aligned document can be opened directly and the full context of the content match can be read. A match or concordance hit found in a traditional translation memory is an isolated segment, divorced from its original context, which can be critical to understanding that translated segment. LiveDocs overcomes this problem.

A third advantage of alignment in memoQ is that, unlike environments in which aligned content can only be used after it is fed to a translation memory, a great deal of time can be saved by not “improving” the alignment unless its content has been determined to be relevant to a new source text for translation. If an analysis shows that there are significant matches to be found in a crude/bulk alignment, the specific relevant alignments can be determined and the contents of these finalized while leaving irrelevant aligned documents in an unimproved state. Should these unimproved alignments in fact contain relevant vocabulary for concordance searches, and if a concordance hit from them appears to be misaligned, opening the document via the context menu usually reveals the desired target text in a nearby segment.

Concordance lookup in memoQ with direct access to an aligned document in a LiveDocs corpus



Jun 16, 2017

Troubleshooting memoQ light resource import problems

The other day I sent a friend some updated auto-translation rules for currency expressions; a short time later I received a message that they would not import into memoQ. The error message displayed was the following:


Now the problem here might seem obvious, but the name of the file I sent was nothing like any rule she already had installed,


In the example shown below, the source of the trouble is more obvious, but if there are a lot of resources in the list shown in the Resource Console or elsewhere, the redundancy of the name in the import dialog and an existing resource name in the list might not stand out so clearly....


In the MQRES file (for the memoQ light resource), the "trouble  spot" is in the XML header at the top of the file. This can be seen by opening it in any text editor (in this case I used Notepad++ to show line numbering);


In this case, the fifth line contains the name that will be applied to the resource after it is imported. The <Name> tags are found in all kinds of memoQ light resources, and the same problem will occur if a redundancy is found during import. Here is an example from a memoQ ignore list (used to exclude certain words from error indications by spellchecking functions):


There are a couple of ways to avoid or correct these problems:
  • First of all, when a ruleset is edited, the text enclosed by the Name tags should be altered. It's probably a good idea to update the Description as well. The FileName  is actually ignored and need not be updated; a difference with the real name of the MQRES file will not cause any trouble with an import.
  • When importing a light resource, you can always change the information read from the Name and Description tags of the MQRES file. This avoids the conflict.

              
      
  • The name and description of an existing light resource can be edited via the Properties of the resource in the Resource Console or Project Home > Settings, Accessing the resource via memoQ Options will currently (as of version 8.1) not show the Properties.

               
memoQ's "light resources" - the portable configurations and information lists to assist various translation tasks - are one of the environment's greatest strengths, but the generally bad state of the associated editing tools and unhelpful error handling continue to cause a lot of unnecessary confusion among users. Key people at Kilgray are not unaware of this problem, and for years there has been a debate regarding new features versus actual usability of the features already present. When you encounter difficulties like the one described above - or other troubles using this generally excellent, leading translation assistance tool - it is important to communicate your concerns to Kilgray Support (support@kilgray.com). 

Without appropriate feedback from the wordface, there is often really no way for the designers and product engineers to understand and prioritize the challenges of usability. I can understand the reluctance of those who have used other tools for many years, where it was clear that their requests for bug fixes or other improvements were largely ignored, to take such action, but it really does make a difference, though not always on a time scale of hours or days. Weeks, months, sometimes years may pass before important changes are made, but usually this is because the urgency of the matter has not been communicated with sufficient clarity, or there are in fact, more pressing matters which require attention. But in fact no serious matters are seldom ignored by those responsible, as nine years as a satisfied user have shown me.


Jun 11, 2017

On a TEUR with German financial translation


Currency expressions occur in great variety in German financial translation, and it is often a great nuisance to type and check the corresponding expressions, correctly formatted, in the target language. One group of such expressions are those involving thousands of euros, typically written in German as "TEUR". However, depending on the proclivities of the source text author, other forms such as T€, kEUR or k€ may be encountered.

On the target side, clients might want to see figures like "TEUR 1.352" rendered in a number of ways: perhaps EUR 1,352 thousand, perhaps €1,352k, perhaps something else.

I have described before how to map out source and target equivalents for developing auto-translation rules or regex-based quality checking instruments to use as the basis for development specifications and case testing as well as how to document the structure and reasoning of the respective rules.

Here you can download an example of possible solutions to the specific problem described above. The downloadable ZIP archive contains two different rulesets for each of the English target text formats cited above; these may be adapted to fit the particular requirements of a client as needed.

There are, of course, quite a number of other currency expressions one routinely encounters when translating financial texts or other business documents, and the diversity of client preferences for target language formats can be considerable. In many cases, it is worthwhile to document which rulesets correspond to which client's preference, perhaps even including client names in the filename to keep things straight. Thus "KPMG_TEUR-to-English" might be the  ruleset name for the client KPMG's preference for how to translate those particular expressions to English.

Busy financial translators who use memoQ and who have discovered the benefits of rulesets like these tell me time and again how many hours or days of effort are saved routinely by using tools like these in translation and subsequent quality checks. They are a "secret weapon" in an often competitive environment with a lot of short, stressful deadlines.

Those who wish to have rulesets of their own to handle the specific requirements of their clients can turn to a number of sources for help. Kilgray's Professional Services department can develop custom rules, as can competent consultants such as Marek Pawelec or yours truly. One caveat: in hiring development experts for memoQ tools based on regular expressions (regex), it is generally a good idea to work with consultants whose primary focus is memoQ. Regular expressions are used in many other environments, such as Apsic Xbench and SDL Trados Studio (as well as many others having nothing to do with translation), but without an intimate, daily working acquaintance with memoQ, developers are often unable to understand the best approaches for working with the memoQ environment and it is all too possible to spend a lot of money on custom work which proves to be unusable, for example because the complex rules take many minutes to load each time a project or document is opened, because the developer did not break the problem down efficiently into its component parts. But done right, these rulesets are an investment which can pay enormous dividends for many specialist translators.