Oct 27, 2015

Beware the document Reimport trap in memoQ!


In between sneezes and hot shots of gingered lime tea I saw the Skype icon on my Windows task bar change to indicate a message. A distress call from a financial translator friend who had just received a new version of the Q3 report she was translating. memoQ has excellent version management features, which include a document-based pretranslation (X-Translate), which allows one to use a current or previous version of a translation to identify unchanged sections which have already been translated when the client sends a new version. This avoids potential confusion with undesired matches coming out of any ofd many translation memories or LiveDocs corpora which might be attached to a project.

This time, however, memoQ seemed to be getting weird on her, with error messages referring to ZIP archives and password protection. Her customer's file was not password protected, and as far as she knew, there was no ZIP archive anywhere in sight. She was dealing with "ordinary Word files". I have no idea what those are, but I hear about them often enough, and that is often where the trouble starts.

Last July I was teaching a week-long introductory course to memoQ in Lisbon, and when I wanted to show the course participants how this X-Translate feature worked, everyone ran into unexpected problems. When it was first introduced in memoQ, I noticed that the updates would work in any format. A translation which starts out as a script in a word processing file might later be updated as a set of presentation slides, and memoQ's document-based pretranslation did an excellent job of enabling me to focus quickly on the new material. It still does, but since the early days, some advocate of unintelligent programming decided that the filter used for the Reimport function to bring in the updated source text should assume that the source format was unchanged from the previous version rather than simply offer an appropriate filter for the current format. One must specify the filter to be used for an updated version if this assumption is not correct (as I also explained in my book New Beginnings with memoQ shortly after noticing this).

I can probably guess why this was done. With certain filters, the filter to use is not obvious from the extension (the multilingual delimited text filter, for example, if it is needed), or there may be a custom configuration of an "obvious" filter needed. In these cases, the assumption of using the last filter settings makes a lot of sense. However, if there is a change of format, where it is clear that the new filter should not apply, then some action should be taken other than a virtual assault on the user with mysterious error messages.

In the case of my financial translator friend, the update came as a DOC file, where the original had been DOCX. Geeks who have nothing better to learn with their time might know that DOCX files are actually renamed ZIP files, so at least the confusing error message above was "truthful" in a sense.

I see this sort of "switch hitting" with Microsoft Word file formats of various generations or changes from RTF to DOC or DOCX rather often. But in the case of importing new document versions, these changes mean trouble for memoQ if the user does not notice the difference, and given that the majority of working translators I have encountered who use Windows operating systems never fix the default system setting which hides the extensions of known file extensions, the chances that your average mortal wordworker will figure out this problem is just about zilch.

Armed with new insight into the problem, my friend was able to import the new document version successfully by specifying the appropriate filter manually and then use X-Translate to get her previous translation applied to sections of source text which had not changed (so that inappropriate 100% matches from a TM or LiveDocs corpus could be avoided). But for the future, I hope that Kilgray will apply a little more intelligent logic to the selection of filters for the document Reimport function of memoQ.

7 comments:

  1. Nothing to do with reimporting, but in respect of memoQ's docx filter, have you seen cases where it removes mc:AlternateContent tags from the document.xml resulting in Drawing objects being treated as text boxes in Word after export? After copying and pasting into a new document in Word after export the text boxes revert to Drawing objects, so some "memory" of the old object is retained, but a cursory examination of the document.xml before and after memoQ reveals some significant tag stripping.

    ReplyDelete
    Replies
    1. No I haven't, but then I haven't had a file recently where that would occur. Have you reported it?

      Delete
  2. Hi Kevin,

    Iwan's helping me get to the bottom of this "text in Word graphics" issue. We reported it to Kilgray a couple of weeks ago, and we're also working with them to help them pin the issue down.

    Keep taking the gingered lime tea, and "good bettering"!

    Nick Rosenthal

    ReplyDelete
  3. I just experienced the same problem yesterday. I simply changed the extension from .doc to .docx to have the files imported into the project.

    ReplyDelete
    Replies
    1. That won't help in most cases. The DOC/DOCX problem is more subtle than the example I use in workshops, where content from a word processing file is moved to Excel or PowerPoint, and possibly some quirks in Word allow the filters to be tricked as you suggest. However, there has been an update to memoQ 2015 in recent builds (100+, not sure about other supported versions at the moment), which addresses the issue discussed here in a very good way. I have been intending to add this information as an addendum to this article but simply haven't gotten to it yet.
      Some weeks ago, the Kilgray product manager showed me how the new Reimport function maintains previous settings if the reimported file type is the same as the last import of a given file, and if the file type is different, a more appropriate default is chosen. In cases such as multilingual Excel files or files with special import settings needed (such as to include hidden text for a word processing file), changes may still be needed to import settings, but generally things will work better with the new programming.

      Delete
    2. Oh, that mysterious message about "ZIP" files has also been changes to something that mere mortals will understand. This should help guide people to the proper solution.

      Delete
  4. Hello! Thanks for sharing your article; it was very useful. Do you know if I would be able to do a re-wrap up in a project after making minor changes in the source documents of the porject? The changes are so small, I don't think it would be worth it to make changes on the Word document to reimport it all back into memoQ to track the changes and then confirm the translation.

    Marisel

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)