Jun 3, 2021

A Hebrew abbreviations "hint base" points the way for other languages

Years ago I published a guideline for how to create something like a term base for memoQ that can handle the irregularities one might find in the way German attorneys on tight deadlines might type the many abbreviations they use in crazy ways. The memoQ term base model can't cope with punctuation and many special characters, so it's basically impossible to use it to map something like "US-$" to the standard currency code "USD". But regular expressions in an auto-translation rule can do that, of course.

The same principle can be used simply to map abbreviations to their full expression so the translator can decode the abbreviation and decide how to render it. Here's an example of that in Hebrew:

This can, of course, be done in other languages, but the fellow who had this idea and asked me about it happens to be a Hebrew translator working into several target languages. I'm tempted to adapt one of my German abbreviation sets to map to the full German expression in the target to serve as an aid to translators who might not be as familiar with the abbreviations as I am and who are also not bound strictly to a particular target language expression. A cheat sheet, basically, or a "hint base" if there is such a thing.

The code for this is particularly simple. Here's a quick look at the resource in an external editor:


The basic "engine" is just a list (#abbreviations#). And the resource was created quickly using search and replace on a list of over 600 abbreviations in an Excel spreadsheet.

In the awful memoQ rule editor it looks like this:


Those who know Hebrew may note that some periods are out of place. I'm not an RTL expert, so I had a few issues with punctuation migrating as I moved data from one format to another, but someone familiar with issues like that can fix things without much ado. This was just a quick prototype to demonstrate feasibility. And a few minutes of search and replace work in a text editor beats entering more than 600 pairs manually in the built-in editor for memoQ. It would be nice if that damned editor included list import features that would read Excel files directly!

As with other auto-translation rules, certain characters may need to be represented by entities or uuencoding. The simple rule shown above can also be made more robust by dealing with variable punctuation, for example. Complexity can always be added. 

Many thanks to the translator colleague who shared his challenge and gave me something fun to do after a grueling day of mapping many messed-up date formats from a lot of different source languages I mostly don't know :-)

No comments:

Post a Comment

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)