Mar 3, 2011

Homogeneity: another "secret" competitive weapon with memoQ

Earlier today I received an e-mail with the following question:

At the moment we are wrestling with an analysis issue that should be solvable but we don't know how to. As I always see your posts about all kinds of TM issues, I was hoping you might be able to provide some advice.


The case is as follows:
From one of our clients we have received what is basically a list of tools in Excel for translation (NL-FR). My colleague made an initial quotation for the project based on the Trados analysis, which revealed 23% repetitions in the file. However, the client received a much lower quote from a different provider. The reason for this, according to him, is that there are a lot of high fuzzy matches in the file which the other provider has counted but Trados doesn't (for example, "... metaalzaagbeugel 12 inch zwaar model met D-greep" and "... metaalzaagbeugel 12 inch zwaar model met rechte greep".)


Do you know whether there is a way (or tool other than Trados) that does count these fuzziess when performing an analysis?
To me, this sounds an awful lot like my PM acquaintance has been blind-sided by Kilgray's homogeneity analysis, which has been a feature of memoQ for a very long time. It's a feature about which I personally have mixed feelings. Used in the wrong way by unscrupulous agencies or ignorant persons, it can be yet another club with which to clobber translators and their rates to the ground and bring about the Hobbesian state of being so many fear is in our future, if not our present. But I approach it as a valuable information tool for helping me estimate how much time a rush project might actually take. Or in the case of my correspondent's competitor, it can be used judiciously to calculate a competitive rate that might not land you in the poorhouse.

Classic Trados and most other CAT tools calculate fuzzy matches based on the content of a translation memory. If these sentences:
The cat is black and white.
The dog is black and white.
The rabbit is black and white.
do not have something similar in a TM used for analysis, they will all be counted as "No Match" segments. However, with a good tool like Atril's Déjà Vu X and it's functional "assembly" technology, similar sentences like these are handled almost like 100% matches from a TM. But DVX still won't tell you about the time you might save.

Kilgray's memoQ analyzes a text for internal redundancies and "fuzzy redundancies", the latter being referred to as having a degree of "homogeneity". But as anyone who works with CAT software knows, even high fuzzy matches can be utterly useless and cost more time than content with no statistical similarities. Translation is about meaning, not statistics, and the price assassins at Trados and other tool pimps of the past sold everyone a lousy bill of goods with nonsense marketing lies like "You'll never have to translate the same sentence again." Well, guess what? If you do successive versions of an information brochure or technical manual and don't start to update your language after a while, your text will soon sound like it was written for an age long past and might not communicate as clearly as it should. Those who can read German should have a look at the various editions of the classic cookbook Die Süddeutsche Küche by Katharina Prato, which was popular from the mid-19th century until the 1930s for truly dramatic examples of the changes in a language. (These are available online via Google Books and various libraries online. They are also a good source of offal recipes - people ate all manner of interesting things back then.) But this happens on a much shorter time scale as well: my eight-to-ten-year-old texts for the AOK social insurance brochure and various IT manuals sound rather awful and dated, though they were quite acceptable at the time they were written.

Used as a planning tool, however, the homogeneity function in memoQ can give you valuable information and help you compete more effectively in difficult times and markets.

7 comments:

  1. Hi Kevin, little did you know (nor did many others) that SDLX and Trados had these features for a very long time too :-) Studio reintroduced this option some time ago because of demand from SDLX users.

    ReplyDelete
  2. Trados? I think you may be mistaken about that Paul. Which versions offer the homogeneity analysis? I was unaware of SDLX in this regard; I'll have to have a look at one of my old Trados installations to see how this works. That would be an option for some perhaps. Although I was a licensee since version 3 or 4, I never understood the enthusiasm for SDLX. The format painting feature was confusing bullshit to me, and after my ex, an SDLX fan and translator for SDL years ago, told me how fed up she was that the SDL language team couldn't deal with her new upgrade and was still stuck on the incompatible old version, I gave it all up as a loss.

    I had forgotten, however, that you mentioned homogeneity in Studio 2009 to me a while back. Thanks for the reminder; an old man needs these once in a while.

    ReplyDelete
  3. Hi Kevin,

    Maybe not exactly the same... you be the judge. But the use previous TM feature in Workbench is similar. It's used to simulate the availability of a TM, obtained through alignment or partial translation of a project. It is quite an obscure feature and whilst it is theoretically possible to use it to compute the project-internal leverage, I don't think it's documented or understood well enough to be actually used for that purpose, other than by power users.

    ReplyDelete
  4. That's a new one on me, Paul. Thanks for the tip; I'll try it out, compare the results and report.

    ReplyDelete
  5. Just a little more on this. The idea behind it (which in some ways is an extension of the homogeneity idea already in SDLX back then) is that you’d “simulate” a full or partial translation of a project by running the next analysis against the temporary pseudo-TM created by the last run. So by extension I mean that this provided for the ability to split up your files and use “from previous analysis” TMs to optimize translation order, i.e. maximize internal leverage even if the project was split and no shared TM would be used during project execution.
    It's interesting because a few users have recently asked whether we would add this into Studio in addition to the feature sets already there... actually you can simulate this in Studio quite easily anyway.

    ReplyDelete
  6. Nice article. I agree that translation is about meaning but being a business there's no way around the statistics (ie. money). It is true that language changes over time, but I think (and this might be a truism) it's not incompatible to save money and effort by not reinventing the wheel and to review pretranslated content, with every project or with a certain periodicity, at a lower rate. I think it makes sense both for translators and clients. Of course it depends on your kind of content as well -- for certain domains it's not only a matter of savings, consistency is a must.

    ReplyDelete
  7. Hi Kevin, hope you don't mind, but I lifted a piece of your post and stuck in over here: http://www.proz.com/forum/memoq_support/289898-make_sure_agencies_leave_homogeneity_switched_off_when_running_statistics_in_memoq.html

    Michael

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)