Translation Tribulations: memoQ 2015

Showing posts with label memoQ 2015. Show all posts

Mar 4, 2017

memoQ Server Mystery: The 99% Solution

Several times in the past year when working on memoQ Server projects for clients, I completed my translation but found that strangely the progress bar was stuck at 99%:

It wasn't until I took another memoQ Cloud subscription for a collaborative project and encountered the same trouble that I realized what was going on.

If you are working on a server project, and graphics in the documents have been imported as well, these are assigned some default text noting that they need to be transcribed (as part of memoQ's interesting graphics translation and substitution workflow). If there is no text to transcribe and translate in these graphics, then nothing is usually done with the graphics. In a local project this does not matter.

But that little bit of default text is in fact a problem currently in server projects. It blocks the use of the "Deliver/Return" function, which may in fact mess up the schedule planning of the project manager who assigned the work. And it is not the translator's fault.

The translator might not even see the graphics if they are not assigned. But even if they can be seen, they cannot be deleted from a checked-out project copy. Not even by someone with administrative privileges for the server.

The solution is to delete the graphics in the Manage Projects window:

This changes the progress bar to 100% after the checked-out project is synchronized, and the translated files can be delivered for further processing and review. Problem solved.

Jun 10, 2016

memoQ 2015 - big improvements, ongoing issues

The pace of life is slow in the Algarve, but memoQ users would generally agree that big TMX imports to memoQ are a lot slower. Until now.

While on holiday I got a note from an occasional client, asking if I could take on the translation of a little Interessenabgleich for the next day. Not a problem I thought, though I had not yet imported the backup of my Big Mama TM with legal content onto the pokey Portuguese laptop I had dragged with me. With only 4 GB of RAM and the megatrojan known as Windows 10 it isn't much for crunching big data, but it is satisfyingly slow for demos and teaching classes, giving me plenty of time for questions and explanations while things load or refresh.

Nearly three hours later, the import was finished. So was I by that time; it was much too late to think of work any more. And the next morning, as I furiously worked to meet the deadline, a new update for memoQ appeared, which I was too muddle-headed to put off until I had delivered. After updating, not only was my license no longer recognized, I could not even boot the program to re-enter it! Welcome to Build 152! On another machine updated remotely, the update worked, but there was a funny conflict with Kilgray's standard template designations, causing the templates to be re-named.

The problems were sorted out by re-installing the build with a downloaded installation file from memoQ.com, and the job made it out on time. More glitches with term base editing and addition showed up later that day as I worked on another text. Grr.

Fortunately, Kilgray responds quickly to most issues, and the automated bug reporting features recently introduced are probably a great help in pinpointing troubles which users are too often reluctant to complain about. The next morning a patched version - Build 153 - was released. So far, so good with it.

The latest build introduces some interesting new features: among them are content extraction of more Microsoft Office object types, including equations (I've waited years for this!!!), for translation, faster TM imports, word count weighting, better tag handling, improved QA checks and some other things. Looking at all this will take some time, but it will be time well spent for my work purposes.

I was intrigued by the statement that "importing a TM with really many segments has just gotten up to 70 times faster", so I decided to test that with the file that bedeviled my evening recently. The problem with marketing hype is that it emphasizes exceptional cases and too seldom reflects what a typical work experience might be. And here too this is the case. The import of approximately 330,000 segments was completed in just under half an hour, about six times faster. Perhaps on my souped-up, RAM-rich machine in the office the difference might be more dramatic, but that is still a great improvement which will make me less hesitant to do maintenance on my large data sets.

memoQ 2015 has had a rockier road than any other version since the infamous memoQ Version 6 for which the code base was rewritten; a bit over a year since its release it still surprises me with inconvenient errors at inconvenient times, in contrast to my usual expectation of stability and reliability about four months after a major release. Actually, memoQ 2015 seemed to get to that point early in my working experience until about five months after its release, when in late autumn last year things went to Hell with SDL compatibility in many projects. Interesting new features, such as regex filters for the working grid and find/replace functions, continued to be added, but it all had a bit of a bleeding edge feel.

Now that memoQ has survived the awful transition to the me-too ribbon interface, it really is time to focus on stability and process reliability. The tool has become a major part of major corporate translation operations as well as the daily work of countless independent translators and we need stability to support our business more than we need new technotunes to whistle. I appreciate many of the features introduced in the past year very much, and I use a lot of them (the new keyboard shortcuts are my favorite for managing writes to termbases properly when I work), but I might trade it all if I can just be sure again that the software won't crash and burn me just before a translation is to be sent to a client.

I am confident that these matters will be addressed, but the matter of when is of critical importance for many. Issues of novelty versus reliability are hardly new in the software world; I have seen every variation of this for about 40 years now, and Kilgray generally, though not always, earns better marks than the competition. And even with the troubles that have beset memoQ 2015 (which have made it very desirable to keep older versions installed too) it is still the best - and most reliable - tool I have seen for the ways I work.

Mar 17, 2016

Dynamic filtering with regular expressions in memoQ

Regular expressions (aka regex) are not a tool for everyone, though this is something that the nerdily inclined often fail to appreciate. For average users, a plain language query interface, perhaps with more limited options, is generally more accessible and used. However, sometimes it's nice to have such "shortcuts" available to select particular structures in a text for translation or editing, and the many people who complained for years that Kilgray did not provide a dynamic regex filter for the working translation grid - a feature of SDL Trados Studio for quite a while now - did have a point worth addressing in development. Now that has happened, though still a bit incompletely when considered in the full scope of memoQ's usual features for selecting text.

memoQ uses regex in a number of its modules, and Kilgray has several webinars which describe these applications, though they require some stamina to watch, and I expect that most people will become hopelessly confused if they try to take in more than one area of application in a single sitting. The uses of regex for segmentation rules, tagging, autotranslatables and text filtering on document import (with the Regex Text Filter) are very different in their approach, even though the underlying syntax of the regex is the same. However, all of these applications allow the configured rules to be saved and re-used, so one could ask an expert to create the settings needed and provide these in a resource file, and many users do exactly that. Thus as long as one understand that regex can be used for a particular problem, the details can be hired out.

This new application of regex for dynamically filtering, introduced in recent builds of memoQ 2015, is a little different (at present). Although the Find/Replace dialog will "remember" regex syntax in its dropdown menu of recent expressions, there is no way to store these expressions, and they must be entered manually to use them. This means that, for now, the average user will have to collect useful expressions like a tourist might scribble phrases in a notebook to use on holiday in a foreign country, and those with a little more sense of adventure might find themselves with a hovercraft full of eels and wonder why.

One such phrase might be the example in the screenshot above. I was translating some financial statements with several formats present for digits in account numbers, dates and monetary expressions. In order to work more systematically with these various formats, I used several different regex expressions to sort and separate them. In the example I was looking for instances where at least four digits were written together in a source segment. That isn't terribly selective, but most of these occurrences in my documents were account numbers, and this helpfully cleaned up the text a lot and allowed me to work a little faster. Other expressions were used to QA date formats and monetary expressions more specifically.

In the working grid for translation and editing, regular expressions can be used in one or both of the fields for the source and target text when the checkbox in the toolbar at the right is marked. Or the regular expressions option in the Find/Replace dialog can be used.

It is somewhat disappointing that regex cannot be used to create static views at the present time. While marking can be used in the Find dialog to enable one to go back and forth between the filter criteria and other configurations of the working grid, there is no way to make a permanent "record" of the filtered segments. For quite a few years, I have wished for the possibility to save the results of my filtering in the working grid in some sort of view, but I was always able at least to recreate the filtering criteria in the dialog to create a memoQ View, which could then be opened at any time or exported in various formats for clients and project collaborators. However, at the moment that is not possible with regex filtering. (There are workarounds involving a change in segment status, but these are often inconvenient in a project in progress.)

The addition of regex filtering to the working grid in memoQ is a welcome feature for many, which I hope will be expanded by Kilgray in the future to achieve more of its potential. But to take advantage of this potential in any way, the average user will indeed need a "phrase book" of sorts, and an efficient way of managing useful collected regex snippets (and naming them for easier re-use in searches and filtering) would be very desirable. If these "regex phrase books" for dynamic filtering and view creation were able to be saved as shareable light resources, it would be possible to build many useful collections to help users at all levels in the translation, editing and quality assurance tasks.

Oct 15, 2015

The Invisible Hand of memoQ LiveDocs - making "broken" corpora work

Last month I published a post describing the "rules" for document visibility in the list of documents for a memoQ LiveDocs corpus. Further study has revealed that this is only part of the real story and is somewhat misleading.

I (wrongly) assumed that, in a LiveDocs corpus, if a document was visible in the list its content was available in concordance searches or the Translation Results pane, and if it was not shown in the list of documents for the corpus in the project, its content would not be available in the concordance or Translation Results pane. Both assumptions proved wrong in particular cases.

In the most recent versions of memoQ, for corpora created and indexed in those versions, all documents in a corpus shown in the list will be available in the concordance search and the Translation Results pane as expected. And the rules for what is currently shown in the list are described accurately in my previous post on this topic. However,

if there are documents in the corpus which share the same main language (as EN-US and EN-UK both share the main language, English) but are not shown in the list, these will still be used for matching in the memoQ Concordance and Translation Results and
if the corpus was created in an older version of memoQ (such as memoQ 2013R2), documents shown in the list of a corpus may in fact not show up in a Concordance search or in the Translation Results.

This second behavior - documents shown in the list but their content not appearing in searches - has been described to me recently by several people, but it could not be reproduced at first, so I thought they must be mistaken, and statements that "sometimes it works and sometimes it doesn't" made these pronouncements seem even more suspect. Except that they happen to be true and I now (sort of) understand why.

Prior to publishing my post to describe the rules governing the display of documents for a LiveDocs corpus in a project, I had been part of a somewhat confusing discussion with one of my favorite Kilgray experts, who mentioned monolingual "stub" documents a number of times as a possible solution to content availability in a corpus, but when I tried to test his suggestion and saw that the list of documents on display in the corpus had not expanded to include content I knew was there, I thought he was wrong. But actually, he was right; we were talking about two different things - visibility of a document versus availability of its content.

For purposes of this discussion, a stub document is a small file with content of no importance, added only to create the desired behavior in memoQ LiveDocs. It might be a little text file - "stubby.txt" - with any nonsense in it.

I went back to my test projects and corpora used to prepare the last article and found that in fact for the main languages in a project all the content was available from the corpora, regardless of whether the relevant documents were displayed in the list. In the case of a corpus not offered in the list for a project because of sublanguage mismatches in the source and target, adding a stub document with either a generic setting (DE, EN, PT, etc.) or sublanguage-specific setting for the source language or the correct sublanguage setting for the target (DE-CH, EN-US, etc.) made all the corpus content for the main languages available instantly. (In the project, documents added will have the project language settings; use the Resource Console for any other language settings you want.)

Content of a test corpus before a stub document was added. Viewed in the Resource Console.

The test corpus with the document list shown in my project; only the stub document is displayed, but
all the indexed content shown above is also available in the Concordance and Translation Results.

It is unfortunate that in the current versions of memoQ the document list for a corpus in a project may not correspond to its actual content for the main languages. Not only does this preclude accessing a document's content without a match or a search, it also means that binary documents (such as one of the PDF files shown in the list) cannot be opened from within the project. I hope this bug will be fixed soon.

Since a few of my friends, colleagues and clients were concerned about odd behavior involving older corpora, I decided to have a look at those as well. Kilgray Support had made a general recommendation of rebuilding these corpora or had at least suggested that problems might occur, so I was expecting something.

And I found it. Test corpora created in the older version of memoQ (2013 R2) behaved in a way similar to my tests with memoQ 2015 - although the "display rules" for documents in the list differed as I described in my previous blog post, the content of "hidden" documents was available in Concordance searches and in the Translation Results pane. But....

When I accessed these corpora created in memoQ 2013 R2 using memoQ 2015, even if I could see documents (for example, a monolingual source document with a generic setting), the content was available in neither the Concordance nor the Translation Results until I added an appropriate stub document under memoQ 2015. Then suddenly the index worked under memoQ 2015 and I could access all the content, regardless of whether the documents were displayed in the list. If I deleted the stub document, the content became inaccessible again.

So what should we do to make sure that all the content of our memoQ corpora are available for searches in the Concordance or matches in the Translation results?

If you always work out of the same main source language (which in my case would be German or "DE", regardless of whether the variant is from Germany, Austria or Switzerland), then add a generic language stub document for your source language to all corpora - old and new - under memoQ 2015 and all will be well.

If your corpora will be used bidirectionally, then add a generic stub for both the source and target to those corpora or add a "bilingual stub" with generic settings for both languages. This will ensure that the content remains available if you want to use the corpora later in projects with the source and target reversed.

Although it's hard to understand the principles governing what is displayed, when and why, following the advice in the red text will at least eliminate the problem of content not being available for pretranslation, concordance searches and translation grid matches. And the mystery of inconsistent behavior for older corpora appears to be solved. The cases where these older corpora have "worked" - i.e. their content has been accessible in the Concordance, etc. - are cases where new documents were added to them under recent versions of memoQ. If you just keep adding to your corpora, doing so particularly from a project with generic language settings, you'll not have to bother with stub documents and your content will be accessible.

And if Kilgray deals with that list bug so we actually see all the documents in a corpus which share the main languages of a project, including the binary ones, then I think the confusion among users will be reduced considerably.

Sep 16, 2015

Getting around language variant issues in memoQ LiveDocs

I was told by some other users that a fundamental change had been made in the way language data are accessed in LiveDocs. It was said that until a few versions ago it had been possible to use documents for reference in LiveDocs regardless of their sublanguage settings. So I was told. The truth is more complicated than that.

According to my tests, memoQ 2015 is the first version of memoQ to have a logically consistent treatment of language variants for both bilingual and monolingual documents in corpora. All the other versions tested (memoQ 2013R2, 2014, 2014R2) are equally screwed up and show the same results.

The "visibility" of a monolingual or bilingual document when viewed in a corpus attached to a project running under memoQ 2015 follows these rules:

the sublanguage (language variant) settings for source and target (of the document or the project) must match the project
or the language setting (of the document or the project) must be generic.

Two rules. Pretty simple. It doesn't matter what version of memoQ the project or corpus was created in, only which version is actively running.

I created a test corpus with the following document mix:

The corpus contained 11 documents, both bilingual and monolingual with a mix of generic language settings and settings with language variants specified (such as German for Germany, Switzerland and Liechtenstein and English for Zimbabwe, the US and UK).

In a project running under memoQ 2015 with the languages set to generic German and generic English, all 11 documents in the corpus were accessible.

So if you want access to all LiveDocs corpus data for the major languages of your project, it is necessary to use generic language settings, either when you load the data into LiveDocs (difficult unless you always use the resource console, since adding documents to a corpus from within a project automatically applies the project's language settings!) or in the languages specified for the project itself. And this will only work with memoQ 2015. If you want to apply penalties to particular language variants this can be done using keyword markers (as seen in the screenshot above) and configuring the More penalties tab of the LiveDocs settings file applied to that corpus.

If the same corpus is attached to a project running under memoQ 2015 with language settings for Swiss German and generic English, the documents available from the corpus are these:

For a Swiss German and UK English project under memoQ 2015, this is the picture:

And for a Germany's German and US English:

All the screenshots above can be predicted based on the two rules stated. Work it out.

"But what happens with earlier versions of memoQ?" you might wonder. It's messy. Here is a look at a Swiss German and UK English project under memoQ 2013 R2, 2014 and 2014 R2:

And here's a project with generic German and Generic English under memoQ 2013 R2, 2014 and 2014 R2:

In each case the five bilingual documents are visible no matter what the project's language settings are. However, there is strict adherence to language variants and the generic language setting for monolingual documents! In my opinion, that's for the birds. I see no good reason to follow a different rule for data availability in bilingual versus monolingual documents. So in a sense, Kilgray has cleaned up this inconsistency in the latest version of memoQ.

Some have expressed a desire for a "switch" setting to allow language variant settings to be ignored. And perhaps Kilgray will provide such a feature in the future. But the best way to get there now is simply to make your project's language settings generic.

Changing the language settings for bilingual data in an existing LiveDocs corpus

If you have a corpus with a mix of language settings and you want to convert these to generic settings or a particular variant, this can be done as follows currently only for bilingual documents:

Select the bilingual documents to export from the corpus and export them to a folder. (If you choose to zip them all together, unpack the *.zip file later to make a folder of the exported *.mqxlz files.
Re-import the *.mqxlz files to the LiveDocs corpus via the Resource Console so you are able to specify the exact language settings you want. In the import dialog, you'll have to change the filter setting manually from "binary" to "XLIFF". These *.mqxlz files are not the same as bilingual files from a translation document in a project and are not recognized automatically.

Unfortunately, there is no way to change the language settings of a monolingual document except to re-import it in the Resource Console in its original form and set the language variant (or generic value) there.

So really, for now, the best way to go seems to be to use memoQ 2015 with generic project language settings.

Jun 14, 2015

Introduction to memoQ in Lisbon / Introdução ao memoQ em Lisboa

From July 6th to the 11th, the summer school of the New University in Lisbon will offer a week-long introductory course for memoQ 2015, which includes:

The functional modules of the memoQ translation environment and how these work together;

Common workflows for translation and editing tasks;

Making use of legacy translations and data from other environments;

Collaboration with users of other translation environments (SDL Trados Studio, OmegaT, etc.);

Tips for problem-solving and added value for translation customers.

The course will be taught by me an Professor David Hardisty, with whom I have spent most of this year so far exploring innovative speech recognition and editing workflows, which will also be an important part of this course. It's a pleasure to work with David, because not only does he have a strong commitment to the success of his students, but he has a marvelous talent for taking my concepts and recasting them in a way that work really, really well for undergraduate and graduate students at all levels.

The course is open to anyone (limited enrollment, 16 persons I think) and offers 24 hours of instruction in the week. It will be taught in English with summaries in Portuguese.

A description of the course in English and Portuguese is here. I am not responsible for the errors in the English. Registration information (Portuguese only, alas) is on this page. Attendees who don't read Portuguese but manage to figure out how to register nonetheless will receive a special reward during the course.

During the week, the course is offered in the evenings, leaving the days free for work or local tourism. It is recommended that translators make a pilgrimage to the monastery named after the patron saint of translation in Belém. Miracles have occurred there, carpel-tunnel syndrome has been healed and dead text has even come to life, but not even the intervention of St. Jerome can save a machine pseudo-translation. You might learn that trick from us, however. Or not.

On August 3rd a course on project management with memoQ will be offered on a similar plan.

Search me!