Translation Tribulations: 10/1/11

Oct 22, 2011

Compatibility workflows with the memoQ Translator Pro edition (Part 1)

Yesterday I had the privilege to present the first of a series of workshops intended to convey my ideas for small-scale outsourcing management with the version of memoQ typically purchased by freelance translators. The participants were project managers at a translation agency that has begun to test the waters for using memoQ to overcome long-term compatibility issues between Trados versions in their accustomed workflows. I have been supporting them occasionally as a consultant over the past two years to deal with sticky issues of text encoding and translators who can't follow directions while working with a disturbing range of tools they often haven't mastered. It's been fun, and I've learned a lot from the infinite human capacity to instinctively ferret out the weaknesses of software and processes.

So I decided to put together a personal overview of the compatibility interfaces for the memoQ Translator Pro edition and my own thoughts on best practice and share it with my colleagues. I wanted to avoid burying everyone in technical detail but instead present the material in a way that most anyone can understand and apply. I don't believe in silly notions such as expecting the average intelligent user to learn and remember the use of regular expressions and other arcana that I, despite four decades of IT experience, continue to struggle with myself too often.

The presentation

referred to memoQ version 5 Translator Pro edition
focused on facilitating project workflows with different platforms rather than actual translation
was intended for anyone outsourcing on a small scale for a single target language in a project (multiple target languages require the memoQ Project Manager or server editions)

It was delivered in two parts in a three hour period, with a long break for coffee, chat, snacks and checking e-mail or testing ideas learned in the first part. Participants were provided with screenshots of the main application screens and did not sit in front of computers but engaged in the discussion. The actual project experience and understanding of the participants was polled at appropriate intervals to make sure that the delivery was relevant and the information was understood and able to be applied.

The goal was to achieve an understanding of memoQ as a central platform for

translation project input - files, translation memories, terminology and reference material
format conversion to facilitate work with different translation environment techniques and tools
translation
editing and quality assurance
creation of deliverable target files and other resources such as term lists, special review formats and commentaries

memoQ is "compatible" with

SDL Trados in all versions (though it is important to choose the right compatibility workflow!)
Star Transit
quite a number of other commercial and Open Source translation environment tools
various content management systems (CMS)
translators who decline to use any tool other than a word processor
and of course memoQ!

so except in the case of projects requiring live, direct work on a third-party translation server platform (such as one from SDL), some reasonable workflow can be found to collaborate with almost anyone using other tools.

memoQ is sort of like the Swiss Army knife of translation environment tools when it comes to compatibility. Only better. Some say it's more compatible with Trados than Trados. And in many cases they're right.

Output formats for translators
memoQ can prepare content for translation in

optimized formats for memoQ users
Trados-compatible bilingual DOC files
XLIFF, a standard used by many environments
RTF tables for those without special translation tools or for others to review, comment and answer questions using only a word processor

TM & terminology data
memoQ reads translation memory data in TMX and delimited text formats and outputs it to TMX. Term data is read in the same formats as TM data but output only to delimited text formats and a particular SDL Trados MultiTerm XML format.

memoQ can also integrate with external termbases, TM sources and machine translation engines.

I sometimes think of memoQ as the hub of a wheel with translators, reviewers and customers working with many different environments as the "spokes".

Basic project management steps with memoQ

These typically involve:

1. Reading in the data after it is properly prepared

files to translate in whatever source format
translation memory data or reference corpora
terminology data
special segmentation rules (SRX files and segmentation exceptions) or other configuration data for optimized workflows

In this step it is important to choose the best method´s of data import and the appropriate filter or combination of filters. In memoQ, filters can be cascaded to convert and protect sensitive data as tags. Thus HTML and placeholder tokens contained in cells of an Excel file might be protected by "chaining" an HTML filter and a custom filter using regular expressions after the usual filter for Microsoft's Excel format.

2. Analyzing the data

Many options are available here, including the weighting of tags to compensate the extra effort involved with complex formats and determining internal similarities in a text (aka "homogeneity" or "fuzzy repetitions") to facilitate better project planning.

3. Extracting terminology (particularly useful for large projects for one or more translators)

4. Preparing and exporting files for translators

Projects can also be sent to translators using memoQ as handoff packages or complete backups with all attached TMs, termbases and corpora.

5. Receiving and re-importing translated content

6. Review, QA and feedback workflows

7. Generating target files and other information for delivery, final statistics

Recommendations for best practices in choosing formats for translators, reviewers and others using a variety of tools will be covered in the second part of this summary. Those interested in a live presentation or relevant materials are welcome to contact me privately.

Oct 20, 2011

Q&A with SDL on the Studio 2011 upgrade

Once again this evening, a friend contacted me with questions regarding the upgrade from SDL Trados Studio 2009 to the new Studio 2011 version. And once again I was at an utter loss for answers, because I haven't even had time to think about the upgrade myself, though after the excellent demonstration I saw at TM Europe in Warsaw last month I will eventually do it for developing interoperable workflows with memoQ and other tools.

So once again I turned to my helpful SDL support guru, who provided answers to my friend's questions. These are questions likely shared by others, so here they are with the answers:

1. Do I uninstall Trados 2007 and Studio 2009 BEFORE installing the update?
[SDL] Only if you want to. 2011 is completely independent. If I was you I would keep all three so you feel safer (even though 2007 and 2011 are enough). 2007 is useful to keep in case you want to create bilingual word files yourself, or connect to 2007 server solutions directly (soon this won’t be necessary either). TTX you can create with TTXiT. Otherwise no real need for 2007 or 2009, 2011 will handle anything else.

2. Do I have to return my Studio license before I get a new license for 2011?
[SDL] No, we haven’t asked for this. 2011 uses a completely new, and simple, licensing system so this is also independent. But once you upgrade (so don’t purchase an additional license, just turn the 2009 into 2011) then your 2009 license will disappear from your “My Account”. It will still work, but if you lose it then it’s gone. If you purchase an additional license for 2011 then you will obviously keep them both.

3. What about Multiterm? Do I need to upgrade that too and will that cost me extra? – from what I could see in my account, it would (or is it somehow included in the Freelance Plus version?). Bear in mind I already have Multiterm 2009 and my customer is mainly bothered with me updating to the new Trados 2011 – could I theoretically use MT 2009 together with Trados 2011?
[SDL] If you have it already then the upgrade will also upgrade Multiterm. This is also independent and for this reason alone I would recommend installing it. No more silly side by side issues..!

Oct 19, 2011

Kindled spirits

It was during discussions in the breaks at the recent TM Europe conference in Warsaw that I began to think the previously unthinkable. Later, as the son of a conference organizer showed me his Amazon Kindle, shortly before my dog knocked it into a pond and stole the boy's lunch, and others present told me how well the device worked for them, I decided to go against my grain and get the gadget to celebrate the old GDR's Day of the Republic.

I'm glad I did. I used to be quite a gadget freak in my younger days, an early adopter of generations of electronic organizers before the Sharp Wizard was a twinkle in a corporate marketer's eye. But the volumes of electronic junk to be disposed of in my various moves, as well as the grinding pace of 'e-progress', has made me deeply skeptical of the value of most technology.

The Kindle has made reading easy again for me. I was very surprised to find that no one was deluded in telling me that the screen contrast and reflectivity are much like paper, and with my little leather case and its integrated reading light, I can even enjoy a quiet read in the dark of night up in my loft. I can adjust the size of the fonts to read comfortably with or without my glasses. On my most recent excursion to escape intrusive neighbors and veterinary horrors to get a bit of recuperative quiet and perhaps accomplish some work, I carried a small library of dozens of classic literary works, some familiar, some not, my favorite newspapers, dictionaries, a few blogs and a vampire novel all in my half-pound Kindle, and I enjoyed more relaxed reading than I have in the past six months. It's a godsend.

I've found a few freeware tools for converting documents to readable formats for the Kindle, and I plan to convert some of my important translation glossaries for reference purposes. I have a notion that this little piece of technology might assist me in taking more of certain kinds of translation work off the technology grid to savor it like a fine wine in a more traditionally influenced but integrated working mode. I'm quite a late adopter in this case; when I ask, it seems that quite a few translating colleagues have such devices. But do they use them in some way professionally? Do you?

Oct 16, 2011

All the myriad "languagepedias" compared

Another useful tip from the latest ToolKit newsletter by Jost Zetzsche is about a multilingual Wikipedia listing comparison tool, Manypedia. It provides a simple interface for comparing entries in any two specified languages, for example to get a quick overview of relevant terminology.

The screenshot above is one I made comparing pages I used to read up on the disease that nearly killed my dog last week, two days after he passed all his hunting utility qualifications for Germany. (He is now on the mend after several harrowing days and a few pointed discussions with the veterinary clinic where he was first misdiagnosed, then given the wrong treatment when the tick-borne parasites were identified.)

This is an extremely useful tool for me, and I like the comparison of side-by side text displays in the two languages. I do in fact often use Wikipedia to get a "feel" for comparative terminology in subject areas, so I will be making use of this.

A proposed model for compensating MT-supported translation

The latest ToolKit newsletter by Jost Zetzsche contains a particularly interesting selection of free and premium content, the latter including what is probably the best overview I've seen yet of the innovations in Kilgray's memoQ 5. However, one of the most intriguing sections was the review of MemSource, in which a model for evaluating and compensating translation content produced with MT assistance was mentioned:

" ... what's the new paradigm that is being proposed (and used) by this tool? It's how machine-translated matches are optionally analyzed and potentially charged. If, they say, we can evaluate TM matches by the source (a perfect match is an identical source and a fuzzy match a similar source), we should look at MT matches by the target.
What?
Well, if there has been no change between a machine-translated target segment and a final target segment, it should be viewed as a perfect MT match and charged accordingly. If the changes are minor, it should be viewed as a fuzzy match, etc."

Up to now, the only really plausible proposals I've seen for compensating MT post-editing or MT-assisted translation have involved hourly work; those I've seen for piecework (word, page or line rates) have been exploitative at best. However, should one engage in the dubious practice of applying machine translation for language services, this paradigm is the most reasonable I have seen yet for assessing the "quality" of the output as seen by the degree of modification required. If for a particular purpose the MT output requires nearly a complete rewrite, it would essentially be paid as a completely new translation from scratch. If the MT output is in fact of somewhat useful quality, this provides a good quantitative means of assessing that and figuring the charges.

I will not say that I support drinking from the poisoned well of machine translation output and affecting the quality of one's other work in the same way that editing monkey work and reading trashy tests might do, but if one were to engage in such a foolish endeavor and join the lemmings surging toward the special interests' "future", this is at least an economic model worth discussing.

Of course, those who favor fixed rates for simple-minded budget planning will still want their straight word rates based on possibly untenable assumptions of quality. But the MemSource paradigm, which could be adopted easily by others or implemented with simple quantitative comparison tools (possibly even looking at an overall percentage change in structure, which might include rearrangements of blocks of text), is, as I see it, the first reasonable, practical suggestion for skipping over the nonsense of up-front "quality metrics", getting to work and letting the chips fall where they probably should.

Oct 12, 2011

Working between SDL Trados Studio 2009 and 2011

Recently I was asked by a friend, who for inexplicable reasons considers me to be some sort of Trados expert, whether she would have to upgrade to the new SDL Trados Studio 2011 because a client of hers had done so. Although there are many good reasons to upgrade to the new version, in her case there are valid personal reasons to delay doing so for a few months.

As usual, I didn't have a clue how to answer her question about whether her SDL Trados Studio 2009 Freelance software could open a package created in Studio 2011 by her client. So I took the lazy but safe path and asked an SDL employee I trust (yes, there are such people believe it or not). He responded with the following comment:

"Answer to your question is yes... 2011 also has an option to create a package specifically for 2009 so you can go both ways quite easily... the creation of a specific package needs to be based on the receiving tool because of differences in the way projects are handled between the two, but the return package created is good for 2009 or 2011."

Not surprising that there is compatibility here, really, given that the new version also offers legacy compatibility with the old Trados Workbench RTF/Word bilingual format used by several other tools such as Wordfast Classic. Overall, it seems quite worthwhile for a user of Studio 2009 to upgrade, but there is no urgency to do so to remain compatible for working with others who have Studio 2011.

Oct 7, 2011

New version of CodeZapper

While I was traveling this week, our esteemed colleague Dave Turner released version 2.8 of his CodeZapper macros for Microsoft Word. I have written about these before; they are among the finest tools I know for cleaning up messy RTFs and MS Word formats that make work with translation environment tools Hell because of superfluous and disruptive tags.

CodeZapper can be a big help with any translation environment which displays tagging in some way. These include OmegaT, Déjà Vu, memoQ and the various Trados instances. For whatever reason, no tools vendor has seen fit to create a quality management tool of this same caliber, though Kilgray at least partially addressed this with a memoQ filter option that often does help with trash tags.

Version 2.8 of CodeZapper is currently available by direct request to the author, Dave Turner. There is now a separate "read me" file explaining the functions of the macro buttons in some detail.

If you benefit from this tool, please support its creator. I do. He has saved me many, many hours of tribulation in translation, far more value given that the little money he has received from me. Here is the first part of Mr. Turner's documentation to give more background on this useful tool:

What is “CodeZapper”?

"CodeZapper" is a set of Word macros (programs written in VBA to automate operations in applications) designed to “clean up” Word files before being imported into a translation environment program such as Deja Vu DVX, memoQ, SDL Trados Studio, TagEditor, Swordfish, OmegaT, etc.

Word documents are often strewn with junk or “rogue” tags (so-called “smart tags”, language tags, track changes tags, soft hyphenations, scaling and spacing changes, redundant bookmarks, etc.).

This tagged information shows up in the DVX or MemoQ grid as spurious {1}codes{2} around, or even in the mid{3}dle of, words, making sentences difficult to read and translate and generally negating many of the productivity benefits of the program.

OCR’d files or files converted from PDF are even worse.

CodeZapper tries to remove as many of these tags as possible while retaining formatting and layout. It also contains a number of other macros which may be useful before and after importing files into DVX or MQ (temporarily transferring bulky images (photos, etc.) out of a file, to speed up import, and then back in the right place after translation, moving footnotes to a table at the end of the document and back after translation, for example).

Is it freeware?

No. To help ensure its continued availability and improvement, there is now a one-time, 20 euro charge for the program. This will entitle you to free future upgrades.

Is it risk free?

Although it’s been fairly extensively tested on a range of files, you should obviously only use it on a backup copy of your files and at your own risk.

How do I install it?

CodeZapper come in the form of a Word template (.dot file) with a custom toolbar which you can either copy to the Word startup directory (following the path in Tools/Options/File Locations/Startup) in which case it will be enabled on starting Word. or to the “Template” directory containing Normal dot and other Word templates (following the path in Tools/Options/File Locations/User templates). You then enable it by selecting it in Tools/Templates and Add-ins, as and when needed.

Oct 5, 2011

Notes from the XLIFF symposium in Warsaw

Some months ago at memoQfest 2011 in Budapest, I was invited to attend the TM Europe conference in Warsaw, Poland, at which the Interoperability Now! initiative would be discussed. Although the initiative and its goals interest me very much, I was undecided about attending until a friend said simply that we should go, and the matter was decided. I had very little time to think about it after that, and in the end I came a day late and left a day later than planned, and two days after that I've still not made it home to my pigeons, Ajax and I having been kidnapped by an alien. My lack of real planning caused me to miss the XLIFF symposium on the first day, about which I heard many things from its participants, mostly positive. As the main conference was the most interesting event of its kind which I have ever attended, most notably for the quality of participants and presenters alike, I was quite intrigued to be told time and again that I should have heard some particular matter at the XLIFF symposium, because the information shared was so much deeper than the particular gold nugget I was admiring.

Thus I asked a new friend if he would be kind enough to share his impressions of the day I missed so that I might enjoy subsequent vicarious attendance and that others might share some of the lessons he took home with him. A single event is many when seen through different eyes, and this is one man's journey through the day. If you attended a different conference, please tell us about yours.

And now, without further ado, I offer you the notes he shared with me, and I thank him for his anonymous insights.

***

Here are the thoughts on a few of the presentations that struck a note.

---

The most interesting translation buyer presentation was given by Kirill Soloviev, who runs localisation at Acronis, a small software company. He described the overhaul of his company's software localisation process, from sending a non-standard (simple text-based) resource format to the translation vendor, to converting to XLIFF in-house and sending that instead.

To me, the eye-opening part is not the new, XLIFF-based process, which seems sensible and straight-forward. It's the fact that the old process ever existed in the first place. I know it happens all the time, but it still baffles me that anyone would send an ill-specified in-house file format directly to a translation vendor, with a prayer of getting it back intact. I guess when it comes time to translate, software companies underestimate how fragile their file formats are—or overestimate the engineering competence of their LSP. You can make it work, but (as Kirill showed) at the great price of time, money, and frustration.

The problems were what you would expect. Up-front, the LSP charged engineering for the non-standard files. Acronis was locked into the only LSP that understood them. The files came back broken anyway. Often, this required sending them back to the vendor. The release was delayed unpredictably. Everyone was frustrated.

Kirill quantified the costs (except the frustration), which I appreciate. I don't have the numbers handy, but the engineering cost was the least significant. That alone would not justify the overhaul for years. (For Acronis, that is—given the number of back-and-forths required, the LSP probably lost money on engineering.) Beating lock-in was bigger. Acronis could now shop around. But more important than money was time. Testing corrupt files and back-and-forth with the LSP wasted employees' time and delayed releases. I wouldn't be surprised if they missed ship dates, though Kirill didn't say.

Whose fault was it? The buyer takes some blame for not foreseeing the misinterpretation and corruption of their format. This is typical of the somewhat sloppy internationalisation practices of software developers. But it's understandable, because localisation is not their business. The vendor takes the weight of the blame, in my opinion, for accepting the job without completely understanding the format. LSPs seem to assume it is their problem to deal with random formats, yet they overestimate their technical ability to do it correctly, and fail to set expectations (and pricing) accordingly. If this were estimated diligently before signing the contract, both parties would probably see it's cheaper for Acronis to process their in-house format themselves.

Lesson: The more standard the format you send to the translation vendor, the better results you should expect. XLIFF is the logical extension (in theory at least).

One other observation is that a pure-XLIFF delivery appears to work well for resource files because they have no context to preserve. So converting to XLIFF doesn't lose anything. But this success is somewhat illusory, because (as Kirill acknowledged) the lack of context is a problem for translation quality. This is an important problem that neither Acronis nor XLIFF has solved.

---

On XLIFF itself there were two panels, which I might title "The Problem" and "The Solution". "The Problem" I might subtitle with a quote from panelist Angelika Zerfass: "What do you mean? What are we supposed to do?" Angelika trotted out a menagerie of freakish XLIFF that would make the stoutest Technical Committee member shiver. Angelika is a localisation trainer, and all of these mutants come from her real-life experiences.

This led to a lively exchange, dominated by echoes that XLIFF is not clearly defined, and implementations regularly get it wrong. Producers see XLIFF as a great place to dump data, without realising how many assumptions they are making about how the data will be treated.

Angelika called for education, guidance, examples, and use cases for XLIFF producers from the standard, and nobody disagreed that this is sorely lacking. The point I would question is her claim that producers need to understand the process that their XLIFF will go through. I believe if XLIFF were well-specified, XLIFF producers could be confident about how their data will be treated—and what they can expect back at the end—while remaining relatively ignorant of the translation process. But this may be hopeful speculation on my part.

Daniel Benito (creator of Déjà Vu) made some points about the lack of workflow information in XLIFF, and claimed that alt-trans was designed without thought for how tools would deal with it. This led to a notable comment from someone not on the panel that XLIFF was designed without tools developers involved at the beginning. That explains a lot.

Shirley Coady (MultiCorpora) noted that the lack of a compliance test for XLIFF allows producers to create whatever they want and call it XLIFF. LSPs (and even tool vendors) seem to accept they are stuck dealing with it. This is the same backwards thinking I pointed out regarding Kirill's talk. LSPs should push back—but they could use a stronger standard, with compliance checking, to back them up. With strong standards, they can show clients that their XLIFF is wrong, and educate them on how to fix it. The world becomes a more harmonious place. (As someone from the audience noted, we could use more translators and PMs in the standards process to bring about this happily ever after.)

---

"The Solution" was a panel of XLIFF Technical Committee members David Filip, Shirley Coady, and Lucia Morado Vazquez, who explained how XLIFF 2.0 will bring about a golden age of localisation interoperability. OK, so that's a little optimistic, but let's look at what hope they offer for the future.

David cited the new requirement for a conformance clause in OASIS standards as an opportunity. He advocates a conformance clause that mandates processing requirements for every element. With XLIFF 2.0, you either conform or you don't; there is no ambiguity. Everyone seems in favor.

Also widely agreed is that XLIFF 2.0 will have a simpler core, with additional functionality enabled by modules. But the line between core and module has not been drawn. David indicated that the distinction may be guided by considering a set of core use cases. He pointed to the model of the UBL standard, which begins with 40 pages of "business processes" that are supported by the specification.

Some important questions remain. The first is how backwards-compatible XLIFF 2.0 will be. Shirley made a statement implying that 2.0 wouldn't be much of a change for tool developers. But I didn't hear consensus on this point, and it seems at odds with the goals of simplification and tighter specification. An earlier presentation showed parts of the current XLIFF 2.0 schema, and they had clearly diverged from the current standard. So the story on compatibility is unclear. There's a lot of fervor for a fresh start in 2.0. And maintaining backwards compatibility while at the same time removing ambiguity seems a major undertaking. But are implementors willing to rewrite their XLIFF support? I'm not sure anyone knows.

Also missing was any project data on XLIFF 2.0. XLIFF 1.2 was published in February, 2008, over three and a half years ago. Yet there's not much visibility from the outside on the progress towards XLIFF 2.0. David estimated 2.0 in the second quarter of 2012, but I'm not sure how he arrived at this. David did call for feature owners to commit the resources necessary to do the work. And Lucia asked for more industry contribution. But neither answers the question of who is signed up for the core work. I hope I've just missed something, but I'm suspicious of the current dates given the scope of the goals.

Finally, I can't help noting the absence of most of the XLIFF Technical Committee from the Symposium. Not to dismiss the contributions of the others present, but David Filip was the only representative to make many meaningful statements about work being done. Missing were most of the heavy hitters: Committee Chair Brian Schnabel, Yves Savourel, Rodolfo M. Raya, Arle Lommel, Christian Lieske (due to last-minute emergency), and anyone from IBM. I don't mean to blame them. Budgets are tight, and it's hard to justify conference expenses. But it was a lost chance for the TC to communicate its message, and for the industry to engage them. With so much uncertainty around XLIFF 2.0, I think that's a real missed opportunity.

---

And in the miscellaneous category was a fascinating talk by Ian O'Keeffe. Ian made a compelling case for the increased importance of sound (sound effects, background sound, music) in virtual experiences, and predicted that sound will increasingly be localised to convey the intended meaning to other cultures. He's written a clever tool to vary a piece of music (tempo, pitch, rhythm, ...) and study how listeners from different cultures hear the differences. His provocative idea is that perhaps, instead of localising sound by choosing or composing a different sound, we might modify the original to evoke the right response.

He ties this to localisation and XLIFF by pointing out that whatever method is used, we need to know the intended meaning of a sound in order to localise it. He proposes putting metadata about the source sound into XLIFF. This is all fine, but sound is a red herring. We need translation unit metadata for all sorts of reasons. And we already have it. XLIFF 1.2 even enshrines some rather (ridiculously) specific metadata (coord attribute, anyone?). Moreover, it allows for extension within the tag, so anybody can put there whatever metadata they want. While there were a few extension-phobes in the audience, I don't see what alternative they could offer, or what harm they see in this kind of use.

Oct 4, 2011

Berliner Übersetzertreffen in Oktober

Liebe Leute,

hier die Einladung zum nächsten Übersetzertreffen am:

Donnerstag, 6. Oktober 2011, ab 20.00 Uhr

Pünktlich zum Start in den Herbst mit einem neuen Ziel:

Gasthaus Figl

Urbanstraße 47

10967 Berlin (Kreuzberg)

U-Bahn: Schönleinstraße oder Hermannplatz

www.gasthaus-figl.de

Die hübsche kleine Internetseite gibt schon das Thema vor: „Pizza, Bier und Stoffserviette“. Und so sympathisch, wie es auf den ersten Blick scheint, ist das Restaurant auch. Den Höhepunkt bilden die beiden Kegelbahnen im Keller, von denen ich eine für uns reserviert habe.

Bis Donnerstag!

Andreas Linke

Ausblick:

Das übernächste Treffen findet wie gewohnt am ersten Donnerstag des Monats statt, und zwar am 3. November 2011.

Search me!