Aug 14, 2017

Oxford living dictionaries for "other" languages

I had some difficulties decided how to title this post given the historically loaded connotations of possible alternatives. The Oxford Dictionaries project does a lot of useful stuff, offering quite a number of monolingual and bilingual dictionaries free and by subscription, which are of great value to editors and translators.

I am particularly excited and encouraged to see bilingual and monolingual resources from Oxford for some common African languages now, such as Setswana, Swahili, Northern Sotho and isiZulu. In recent years it has been a great blessing to meet some African colleagues from Egypt, Nigeria, Kenya, Angola and elsewhere at IAPTI events, memoQfest or other venues. In some of my education support efforts through IAPTI I have found rather interesting resources in South Africa and a few other places, but on the whole it appears to me as an outsider that colleagues there face a relative shortage of resources for any work they might do with local languages not transplanted from Europe. So it is a great pleasure for me personally to discover and share such resources (and I would encourage others to do so as well in the comments below).

The Oxford global languages also features other important languages such as Indonesian, Malay and various Indian languages like Hindi, Gujarati, Tamil and Urdu. And then there are the usual suspects like English and Spanish.

I fell in love with the Oxford English Dictionary as a child, when I found the long shelf filled with its volumes of historical etymology. The dictionaries mentioned and linked here are focused more on current usage of living languages, but they should have much of the same scholarship and rigor that goes into the making of that marvelous OED. Enjoy.

Aug 11, 2017

The memoQ Web Search memory leak fix! (updated again)

A big thank you to Italian veterinary surgeon and translating colleague Claudio Porcellana, who solved the mystery of the memory leak which has plagued users of memoQ's Web Search for years now. While Kilgray developers busily work on alternative engines for fixing future versions, Dr. Porcellana used his head - as impatient Southern Europeans are wont to do.

The problem it seems is with troublesome Java applets on sites like Linguee. So he simply turned them off. And plugged the leak.

Kilgray currently uses an Internet Explorer component for memoQ Web Search, so here's the fix:
  1. Start Internet Explorer and open Internet options in the Settings:

  2. Go to the Security tab and click the Custom level button:

  3. Then find the Scripting section and disable the Java applets:

Leave Active scripting (= JavaScript, etc.) enabled or you will mess up the search for some sites like LEO.

After I made this change, I tested memoQ Web Search. Instead of the usual steady increase in memory consumption I used to observe due to the infamous leak, everything remained rock stable, and all my site searches that I typically use for legal and scientific translation worked just fine.

This fix ought to work with all versions of memoQ since the introduction of the web search feature (in memoQ 2013 R2 I think it was). So thank you, Dr. Porcellana, for making our working lives a little less crash-prone!

UPDATE: Further testing has revealed (as noted in some comments below) that there is more to the story. I was puzzled that some people continued to experience the memory leak unless "active scripting" was active, and at Varga's request I tested again on my system (I was sure up until then that his troubles might be tied to a Hungarian system, but it turns out that is in fact not the case). To9 my astonishment, the problem re-appeared after it had been eliminated before after disabling the Java applet scripting alone. I had to turn off "active scripting" too to achieve stability. And then suddenly the problem went away again.

Puzzling, right? And annoying of course. And then an idea occurred to me, and I dug up my Linguee user account password and logged in to Linguee under my user name. I contribute a lot of terms when I search in other browsers so I have a lot of credit, and this credit is applied as searches without ads.

It's the advertising. Some ads seem to involve Java applets. Other ads do buggy things with scripts that do not use applets. And some ads do neither of these two things and cause no trouble.

Maybe an ad blocker applied to Internet Explorer will fix the problem for memoQ Web search until the changeover to Chromium occurs in the next version. [No, it does not, alas.] In the meantime, I will achieve stability for today's big job by staying logged in to my Linguee account!

YET ANOTHER UPDATE: As advertisements and the like have been identified as the real source of trouble, one user suggested substituting the Windows hosts file. This approach has a number of advantages apparently; it presumably de-craps your Internet connection by blocking sites that send troublesome content, communicate with spyware, etc. A better hosts file with instructions for where to put it is found at:

Substituted hosts file on my Windows 10 system; the old file was backed up by re-naming it.

Aug 3, 2017

"Coming to Terms" workshop materials for terminology mining

I recently put together a two-hour online workshop to teach some practical aspects of terminology mining and the creation and management of stopword lists to filter out unwanted word "noise" and get to interesting specialist terminology faster.

A recording of the talk as well as the slides and a folder of diverse resources usable with a variety of tools are available at this short URL: The TVS recording file can be opened and played by the free TeamViewer application.

The discussion focuses primarily on Laurence Anthony's AntConc and the terminology extraction module of Kilgray's memoQ.