Jun 14, 2010

Foolish tools

No matter what translation environment tool is used to work on a text like the one below, the experience can be an unpleasant one. MemoQ's cell scrolling feature makes this horrible case just barely tolerable. It's one of the biggest tag salads I've see in a while. In a "future scenario" I would like to see some option for condensing messes like this so that the real work of translating the text can be accomplished better. Usually I see garbage like this in INX (InDesign) files. This particularly charming case is a TTX made from a DOCX and imported into memoQ for reasons that are too hair-raising to explain.

10 comments:

  1. There is a tool called CodeZapper smelting away these things (Word only!), available on the MemoQ-Yahoo Groups page.

    ReplyDelete
  2. Yes, I'm aware of Code Zapper and have plugged it here a number of times. However, these aren't rogue codes but part of a totally over-the-top use of bookmarks, footnotes, index entries and cross-references in the document, and I'm not sure it would help in this case. Earlier versions got me into trouble from time to time with changes to such structures.

    ReplyDelete
  3. How about making the translation tool treats these tags as external/segmenting so that they don't appear inside segments. I'd be surprised to see any file format that would contain 300+ tags, all of which should be defined as inline/internal.

    If the translation tool that claims to support this INX file still defines these tags as internal in the default INX filter/parser, the tool developers should probably re-think and fix that.

    ReplyDelete
  4. @langtechie: Yes, SDL Trados 2007 (the original tool used to process the TTX before it was imported to memoQ does have issues. Unfortunately I don't dare define that tag as external as it appears (in smaller numbers) in the middle of many sentences.

    However, due to subsequent problems with the file, I re-did the project by prepping it in SDL Trados Studio 2009 and noticed that where all the crap above is located there is just a single tag. Much easier to deal with.

    ReplyDelete
  5. CodeZapper is frequently trotted out as a solution to tag madness, but it's never solved any of the tag issues I've faced since starting to use memoQ. It's generally been a waste of the two minutes spent running it.

    Now that I'm using Ver. 4.2 the number of crappy purple tags is greatly reduced. But I still wish that instead of the whimsical "Don't press this button" (not funny anymore if it ever was), there were a "Get these frigging tags out of my face and let me deal with the consequences as I see fit" button.

    ReplyDelete
  6. Rod, it wouldn't surprise me if the peculiarities of 2-bit fonts were somehow related to the shortcomings of CodeZapper for your purposes. Just a guess on my part, but the creator translates from French and made the macros in his "spare time" to deal with a common problem for users of DVX (and other tools). As you've noticed with other things, if developers don't work with particular configurations, it's often difficult for them to get a handle on problems in those areas. And in Dave's case, why should he frankly? He doesn't need it, most others don't and it's a free tool. Anyone wanting to rewrite the macros to work on Japanese OCR texts is welcome to do so I'm sure.

    Did you see that feature in mQ 4.2 for ignoring "minor formatting tags" in a DOCX? Very, very useful in several cases I've faced, so much so that I'm converting a lot of DOC files to DOCX to simplify the jobs.

    ReplyDelete
  7. I haven't seen the "minor formatting tags" function yet, and nobody sends me DOCX (conservative lot the Japanese). But there are far fewer tags appearing than in the earlier versions, so I'm reasonably happy about that. I generally get very uncomplicated documents where no tags of any kind are really required (but that doesn't mean I want to convert the documents to text first - I'll take the 'hide all tags' button please).

    ReplyDelete
  8. Just for laughs, Rod, try converting your DOC files to DOCX and using that function in the "Import As..." dialog. Of course you'll have to re-save the DOCX as a DOC later, but you might be surprised with how much of the remaining junk disappears this way.

    ReplyDelete
  9. Slightly off topic but maybe useful to somebdy: I'm having great results in converting doc files to docx, extracting them to xliff with tikal (okapi) and translating them in omegat. With this method tags don't bother me anymore and i don't have to be afraid to break formatting during doc to odt conversion.

    ReplyDelete
  10. Yes, a tag jam is a horrible thing. It's ugly to see and may crash the CAT tool when you try to deal with it. Even worse, the file may be damaged. Most of the tags in the jam are useless format tags from my experience.

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)