Aug 3, 2013

memoQ&A: How do I leverage the pretranslated SDLXLIFF content?

Given the interesting and surprising answers I received in previous two-stage "quiz posts" in which a challenge was posed for others to answer before I present my approach, I have decided to try a series of such posts. I've polled a few friends about a possible name for this series - memoQuiz, memoQ&A, CATquiz or perhaps something else. The first two choices suggest that memoQ would be the focus, but despite the impressions some may have of my publication habits, memoQ is far from my only concern with productivity involving the software we use for translation processes. So I'll leave that question open for now and use the current "vote leader". Arguments for and against in the comments are welcome.

Today's "quiz" is inspired by my continuing research into the current status of interoperability between SDL Trados Studio and memoQ 2013. As Kilgray has continued to upgrade the quality of its filters and other features for working with files from other platforms, SDL advocates have been increasingly at pains to find the rare, exceptional cases that do not work well or at all and present these as "common" and proof that we should all just bow down and kiss the One Ring ;-) The latest variant of that theme which I saw involved tracked changes displayed in source segments of the translation grid. It was fascinating, really, but a bit bizarre and utterly outside anything in my experience with 13 years of commercial translation. I'm not about to torture myself with an unergonomic application if a simpler one covers most of my professional needs. The Pareto principle rules.

Here's the scenario:
  • You receive a pre-translated SDLXLIFF file with segments of various status. Some are pretranslated fuzzy matches, some are not-yet approved or even rejected pre-translated segments or one which the outsourcer confirmed (but did not "approve") before sending the file. And some segments have not been translated at all.
  • The outsourcer just went on holiday and forgot to send you the translation memory!
  • You want to be able to use the pretranslated and approved content in the SDLXLIFF file as a reference while you translate this file and others. How can this be done???
  • Here is the file to translate. It is an English source text being translated into German.
The file to translate as seen in SDL Trados Studio. Click to enlarge.

Thank you to those who contributed their suggestions in the comments! Here is how I approached the problem:
The bilingual file I was given has different "qualities" of translated segments. There are unconfirmed (and possibly dodgy) sentences, including a "rejected" 100% match, translated (confirmed) segments and approved (proofread) segments. A TM in memoQ gives me no opportunity to differentiate match quality based on row status. LiveDocs does!
So I send the SDLXLIFF file received to a LiveDocs corpus on a "temporary" basis, where I apply special settings to apply a fairly heavy penalty to unconfirmed segments, a mild penalty to translated (confirmed) but unapproved (not proofread) segments and no penalty at all to the parts which have already been check and approved.
Details of the settings configuration and an example of how these settings apply to the SDLXLIFF file used as an example are shown in the video below. A similar approach can be applied to any bilingual file (or translation stored in LiveDocs) where there may be significant differences in segment status.


Time index to the video tutorial:

0:30  Creating a new LiveDocs settings profile
1:05  Editing the new LiveDocs settings profile
1:29  Match threshold settings
2:16  Alignment penalties
3:01  Bilingual document penalties
3:45  Penalty for unfinished alignments
4:24  Sub-language difference penalty
4:57  A "tour" of the row status for segments in the SDLXLIFF
6:20  Adding the translation file to the LiveDocs corpus
7:14  Applying the new LiveDocs settings to the LiveDocs corpus used
8:00  How the new LiveDocs settings work for matches in the translation window
9:22  Advantages of using LiveDocs rather than a translation memory

9 comments:

  1. Approach 1:
    Create view containing only the confirmed segments. Select all segments in the view and confirm them into your own memory.
    Approach 2:
    Create an XSLT stylesheet to transform the sdlxliff file into TMX and run it using Okapi Rainbow.

    ReplyDelete
  2. Maybe I did not understand it, but you might have two possibilities:

    1. Live Docs: In LiveDocs you are able to see the comment without being forced to open or hover the mouse around the comment. On the other hand side you are unable to sort the doc by state (okay with 25 segments, but not with 25.000).

    2. Simply import it. memoQ 2013 shows the segment state better than Trados (2009). Sort by state (the fast way, but it means working w/o context). Confirm the approved segments (depending on whether you are confident enough...), check the unapproved 100 % matches. If you do not have a TM, it remains a "problem" to use the fuzzies - they have to be checked & translated one by one. And LiveDocs "fails" -
    it indicates a 101% match.

    ReplyDelete
  3. Addendum: To use the fuzzies, you can produce a view only with pretranslated segments), confirm them (to a temp TM, not to the project TM). Then export this to mqxliff and add this file to LiveDocs, but with a punishment of 10-30 % in the LiveDocs options to avoid any unwanted 100 matches.

    This way you get at least "artificial" fuzzies for the project.

    ReplyDelete
  4. Thank you for your suggestions. Each will work in its own way, though as you both noted, the differences in row status make some selectivity and differentiation desirable.

    @TvNellen: You are basically correct about how to get the rows into the translation memory, but this is sort of a special case, and the usual way many people might try to do it simply will not work. For example, I often select all rows and then simply use my keyboard shortcut for confirming. That will not work with these proofread segments. You'll have to use Operations > Confirm And Update Rows... and then chose the appropriate settings in the dialog. This is a useful dialog to know. As for XSLT... well, despite my fondness for that in some cases (like transforming the XML term output from memoQ into something pretty to give to clients), it's way too much work in this case I think unless you are going to be doing this a lot.

    @Torsten: You were on the right track to suggest LiveDocs. Your concerns about LiveDocs "failing" will be addressed by differentiating the penalties in the bilingual document based on the row status! This is one of the great advantages of LiveDocs over a TM as I mention in the video tutorial.

    ReplyDelete
  5. @TvNellen: I've now added a video to my YouTube channel to cover the case you suggested (writing to a TM based on row status). The URL is http://youtu.be/3HtEauNGtBk. Thank you!

    ReplyDelete
  6. Thanks for the video Kevin, your method is indeed easier. I never knew that functionality existed. XSLT is of course a silly suggestion, but you have to admit it's more fun...

    ReplyDelete
    Replies
    1. There is so much functionality in most modern software and so many potential ways to apply it to the huge variety of cases we encounter that one can hardly expect to be aware of it all or even remember what one encounters unless the experience stands out in some way. In fact, many of my past blog posts were written as personal reminders, and when I need to deal with a problem, I often discover in a Google search that I described that very problem several years before... then forgot about it. But with a lot of the CAT tools, I think much of the problem lies with the generic, feature-oriented way they are taught. We need more good, scenario-based examples from real work.

      Delete
  7. @Kevin: Indeed you found some shortcuts to work more efficiently (I am working with memoQ since 2007, I should really have a closer and relaxed look at any new feature, not only the obvious ones). Anyway, shouldn't it be possible to read out match percentage values from bilingual files and apply any additional penalties (oh yes, they are penalties, not "punishments", OMFG) to get better results. In your solution on YouTube (and similar in my suggestion) any (unconfirmed) fuzzy between 99 % and 70 % gets 85 %. Wouldn't it be nicer to know whether a fuzzy was 61 % or 99 % (minus your penalty, of course)?

    The match percentage is inside the sdlxliff file, so it should be possible to add this feature and to read it out in LiveDocs (as in the translation pane when I import this file).

    ReplyDelete
    Replies
    1. No, Torsten, you can't do anything with the match percentages per se in those unconfirmed segments. Given the very different way matches are calculated in each tool I think it might be a bit dangerous and unproductive to go down that path. Just look at the difference between what a 98% match can mean in Trados and memoQ! In memoQ I know that means some little thing like formatting or perhaps a number. In Trados that can be differences in words, like the inclusion of a "not", which changes the whole meaning of the sentence. (Well, number differences can too, but that's a rant for another day). I have noticed over the years that memoQ has trained my mind to categorize and process fuzzy matches more effectively than when I used to do a lot of actual translation using Trados.

      However, there is one point you reminded me about here that I plan to investigate in some detail. It seems that in some cases there are fuzzy match scores from SDL Trados Studio which get lost when the file is finished and goes back to Trados. I have not seen that yet myself - quite the contrary - and I suspect this may be due to another tester's insistence on showing me how tracked changes for source texts works in Studio (very interesting, but not relevant to how I typically work, though I do see potential applications). Anyway, be aware of that possibility, though its real consequences will be zilch in most cases. I'll study the problem more and perhaps discuss it another day.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)