Pages

Nov 4, 2019

Finding a file in some random memoQ project


One of the nice things about new users of any software is that they approach it without the ingrained habits of routine users of that software and often ask useful questions that the rest of us might not have considered. One such question was posed today by Aloísio Ferreira, a translation student at FCSH/NOVA in Lisbon. He thought it would be useful if there were a function to help translators or project managers locate a file in some memoQ project no longer remembered. I can see the point of this; on a number of occasions in the past, I have clicked around through various projects trying to do just such a thing, and I had not considered any better way of achieving my objective.

As many of you already know, the Windows Explorer is able in some cases to index and search the content of files, and I knew that the majorVersionStore.info files in the project subfolders for memoQ included the names of files. So I went through the steps needed to ensure that the file contents would be indexed as plain text. All quite unnecessary it turned out.

Feeling very sophisticated after updating which folders were to be indexed, I tested the idea in the folder window for my memoQ projects, which contains all the subfolders with the name of each project. As you can see from the results in the screenshot above, the projects also contain placeholder files (0 bytes in size) with the names of the files imported to translate.

So the short answer to Aloísio's question is that no new feature programming is needed in memoQ; simply go to your projects folder and do a search with part of the filename (use quotes if there are spaces in the name, as in the example above), and the path for the files in the results will show you which projects have what you ate looking for.

From there you can use part of the project name in the filter field of the memoQ Dashboard to find the project you need, open it and work with the file in some way.


And of course once you have opened the project, if there are a lot of files in the list of Project home > Translations, there is another filter you can use to zero in on the one(s) you want quickly:

This screenshot is from a project with two target languages, created by the useful PM Edition of memoQ

What good is all this? It depends. I usually go on a hunt like this if I am given a new version of some file I translated years ago, and I can't remember where it is to use the X-translate feature so the pretranslation will use and lock any unchanged blocks of text from the old version. This can also be used (indirectly) to figure out which heavy resources (attached to the old project) may be useful for other work. I'm sure you can come up with half a dozen reasons of your own if you think about it.

Nov 3, 2019

Yahoogroups is dead. Check out groups.io and the migrated memoQ peer support!

A few weeks ago I saw a notice that Yahoo is taking down its old groups facility, which, back in the day, was like a jazzier version of the old listserves. At the beginning of my career as a commercial translator, I found the translation-related groups there to be enormously helpful, and I met many colleagues who were mentors to me and remain friends to this day. Unfortunately, some years ago, Yahoo reorganized the interface of the groups feature so that I often could not figure out how to use it any more, so aside from occasionally peeking at mailed digests of the content in half a dozen groups, I haven't participated actively in many years.

So I really wasn't sad to learn that YahooGroups are about to be axed. However, the need for better organized sites of this kind has hardly gone away. Although for many organizations and interests, Facebook has come to dominate group communications, Facebook sucks like a Kremlin vacuum cleaner from Hell when it comes to managing content for user advice and Help. Even users who are not lazy find it difficult to search for solutions already posted, so one tends to see the same help requests every week, sometimes the same issue more than once in a day. The archives of a good listserve are usually much better sources of help.

So I was pleased to hear that the YahooGroup for memoQ peer-to-peer support had migrated to a new platform at https://groups.io/g/memoQ/. And I hope other good CAT tool support groups do the same (feel free to post any such links in the comments).

Even if Facebook were not the cesspit of fake news, political and social manipulation that threatens the stability of so many countries around the world as well as the physical safety of everyone (live streaming mass slaughter isn't my idea of fun on a Saturday night, but then I am a bit old-fashioned), it is unlikely that it will ever become a good platform for the kind of technical information sharing among professional peers that we need. YahooGroups met that need once, and I think that these new incarnations on Groups.io may do a better job with less (or no?) trashy ad spam.

If you are a memoQ user, I encourage you to join the new group if you were not already on the old YahooGroups platform. (If you were, you have probably already been migrated by the helpful moderators.) Contribute your expertise, and ask the questions that need asking and answering for all of us to move forward with the technical challenges of the tools we use.

Oct 29, 2019

Bilingual EU legislation the easy way in #xl8


Translators of European languages based in the EU and many others deal often with citations of EU legislation or need to consult relevant EU legislation for terminology in their translations. One popular source of information for that is the EUR-LEX website, which provides a convenient archive of legislation and related information, with the possibility of multilingual text displays, as seen here:


Some years ago, I published a description of how data from these multilingual EUR-LEX displays can be transferred to translation memories or other corpora for reference purposes, and more recently I produced a video showing this same procedure. But some people don't like the paragraph-level alignment format of the EUR-LEX displays, and these can also occasionally be seriously out of sync for some reason, as in this example (or worse):


Now I don't find that much of a nuisance when I use memoQ LiveDocs, because I can simply view the full bilingual document context and see where the corresponding information really is (kind of like leaving alignments in memoQ uncorrected until you actually find a use for the data and determine that the effort is worthwhile), but if you plan to feed that aligned data to a translation memory, it's a bit of a disaster. And many people prefer data aligned at the sentence level anyway.

Well, there is a simple way to get the EU legislation texts you want, aligned at the sentence level, with the individual bitexts ready to import into a translation memory, LiveDocs corpus or other reference tool. See that document number above with the large red arrow pointing to it? That's where you start....

Did you know that much of the information available in EUR-LEX is also available in the publicly available DGT translation memories? These are sentence-level alignments. But most people go about using this data in a rather klutzy and unhelpful way. The "big data" craze some years ago had a lot of people trying to load this information into translation memories and other places, usually with miserable results. These include:

  • the inability to load such enormous data quantities in a CAT tool's TM without having far more computer RAM than most translators ever think they'll need;
  • very slow imports, some apparently proceeding on a geological time scale; 
  • data overload - so many concordance hits that users simply can't find the focused information they need; and
  • system performance degradation, with extremely sluggish responses in a wide variety of tasks.
Bulk data is for monkeys and those who haven't evolved professionally much beyond that stage. Precision data selection makes more sense, and enables better use of the resources available. But how can you achieve that precision? If I want the full bilingual text of EU Regulation No. 575/2013 in some language pair, for example, with sentence-level alignment, how can I find that quickly in the vast swamp of DGT data?

Years ago, I published an article describing how it is better to load the individual TMX files found in the downloadable ZIP archives from the DGT into LiveDocs so that the full document context can be seen from the concordance searches. What I didn't mention in that article is that the names of those individual TMX files correspond to the document numbers in EUR-LEX

Armed with that knowledge, you can be very selective in what data and how much you load from the DGT collection. For example, if you organize the data releases in folders by year...


... and simply unpack the ZIP files in each year's folder...


... each folder will contain TMX files...


... the names of which correspond to the document number found in EUR-LEX. So a quick search in Windows Explorer or by other means can locate the exact document you want as a TMX file ready to import into your CAT tool:


These TMX files typically contain 24 EU languages now, but most CAT tools will filter just the language pair you want. So the same file can usually give you Polish+French, German+English, Portuguese+Greek or whatever combination you need among the languages present.

I still prefer to import my TMX data into a LiveDocs corpus in memoQ, and there I can use the feature to import a folder structure, and in the import dialog, I simply write the name of the file I want, and all other files (thousands of them) are promptly excluded:


After I enter the file name in the Include files field, I click the Update button to refresh the view and confirm that only the file I want has been selected. Depending on where in memoQ you do the import, you may have to specify the languages (Resource Console) to extract or not (in a project, where the languages are already set). Of course, the data can also be imported to a translation memory in memoQ, but that is an inferior option, because then it is not possible to read the reference document in a bilingual view as you can in a LiveDocs corpus; only isolated segments can be viewed in the Concordance or Translation results pane.

How you work with these data and with what tools is up to you, but this procedure will provide you with a number of options for better data selection and improved access to the reference data you may need for EU legislation without getting stuck in the morass of millions of translation units in a performance-killing megabomb TM.