Translation Tribulations: SDL Trados Studio

Showing posts with label SDL Trados Studio. Show all posts

Jan 28, 2020

Another look at Windows 10 speech recognition

A few years ago while on "holiday", I returned from dinner to find that my laptop had bluescreened. Panic time! It was Saturday night, and I still had quite a lot of text to translate and deliver on Monday morning. And up on the highest mountain in Portugal, I wasn't sure where I could find a replacement to finish the project, which was, at least, not utterly lost, because I had put it on a memoQ Cloud server for testing. The next day I got lucky: about 50 km away there was a Worten, where I picked up a gamer laptop with lots of RAM and an SSD. Well, not so lucky, as it was a Hewlett Packard Omen, with a fan prone to failure, but that's another story....

This new laptop was my first encounter with Windows 10. I had heard that this operating system offered improved speech recognition capabilities, and since I prefer to dictate my translations and downloading the 3 GB installation file for Dragon NaturallySpeaking (DNS) from my server at the office was going to take forever, I thought I would give Windows 10 speech recognition a try. I hadn't installed my CAT tool of choice yet, so I fired up Microsoft Word and began dictating. "Not bad," I thought. Then I tried it in my translation environment, and the results were a complete disaster. So I put that mess out of my mind.

Since then there have been some notable advances in speech-to-text capabilities on a number of platforms. But the best solution for my languages (German and English) with DNS became increasingly cranky thanks to neglect of the product by Nuance. Every week I read new reports of trouble with DNS in a variety of environments in which it used to perform very well. Apple's iOS 13 was a great leap forward of sorts for speech recognition and voice-controlled editing, but the new features are only available in English, and having Voice Control activated totally screws up my otherwise rather good dictation in German and Portuguese (or any other language). And don't get me started on the crappy vocabulary addition feature, which uses text entry alone with no link to actual pronunciation. Good luck with that garbage. It's not a bad solution in Hey memoQ with the additional command features added, but iOS dictation is not completely up to reasonable professional standards yet.

I probably would have given no further thought to Windows 10's speech-to-text features if it weren't for Anthony Rudd. We've corresponded a bit since I bought his excellent book on regular expressions for translators (and there's another practical guide for us coming soon from him!), and in a recent discussion he alluded to the use of Unicode with regex as a simple way of dealing with some things another colleague was struggling with. I was intrigued by this, and so for about half a day, I ran down a rabbit hole, testing Unicode subscripts and superscripts for a variety of purposes like fixing bad OCR of footnote markers and empirical formulae, autocorrecting common expressions for subscripted variables and chemical terms, including subscripts and superscripts in term bases and much more. Fascinating and useful stuff on the whole, even if some fonts don't support it well.

And of course I looked at using these special Unicode characters in speech-to-text applications. DNS had some funky quirks (not allowing numbers in the "spoken" version of terms, for example), but it worked rather well, so I can now say "calcium nitrate formula" and get Ca(NO₃)₂ without much ado. And for some reason it occurred to me to give Windows 10 speech recognition a try, just because I was curious whether vocabulary could in fact be trained. Indeed it can, and that feature is better than iOS 13 or DNS by far.

But first I had to remember how to activate speech recognition for Windows on my laptop again. When in doubt, type what you're looking for in the search box....

Notice I've pinned Windows Speech Recognition to my taskbar on the right, which is good for quick tasks.

Gesucht, gefunden. Unlike other speech recognition solutions, the one in Windows 10 works only for the language set for the operating system. And options there are limited to English (United States, United Kingdom, Canada, India, and Australia), French, German, Japanese, Mandarin (Chinese Simplified and Chinese Traditional) and Spanish.

I put on my trusty Plantronics earset (the best microphone I've used for dictation tasks or audio in my occasional webinars in the past year) and began to dictate, first in Microsoft Word, which had shown acceptable results in my tests long ago. I found that adding vocabulary in the Speech Dictionary (accessed via the context menu in the dictation control element shown as a graphic at the top of this post) was dead simple.

The option to record pronunciation enabled me to record non-English names and words in several languages. And sure enough, the Unicode subscripts and superscripts worked, so I can now say CO₂ (I just dictated that) to my heart's content.

I was expecting a mess when I tried to use Windows 10 speech-to-text in a CAT tool, but it was not to be. It was brilliant, actually. I tried it in my copy of SDL Trados Studio, and with the scratchpad disabled so I could dictate directly into the target it worked well. No voice-controlled editing like I'm used to with DNS in memoQ, but that DNS feature does not work in SDL Trados Studio anyway, so this is no worse. But with the scratchpad box enabled (see the screenshot below), I could use voice commands to select and correct text or perform other operations. Brilliant!

After clicking or speaking "Insert", the text will be written to the target field with the proper formatting

So users of SDL Trados Studio who translate to a target language supported by Windows 10 speech recognition are probably better off not giving their money to Nuance, which I'm told can't even be bothered to make a 64-bit version of DNS now (which probably accounts for a lot of the trouble people have with that program.

I tested Wordfast Pro 5, which seems to confuse the speech recognition tool horribly, with source text displayed in the floating bar for some odd reason. But my earlier tests of Wordfast with DNS were equally unhappy, so somehow I'm not surprised. And I didn't test the Memsource desktop editor, which took the price a few years ago for the worst-ever DNS dictation results with a CAT tool. I'll leave that to someone with a much wider masochistic streak.

But what about memoQ, my personal environment of choice for most translation work? Equally brilliant, works just the same as SDL Trados Studio. No voice control for editing without the dictation scratchpad enabled (there, DNS has an advantage in memoQ), but with the scratchpad you can use the voice commands to edit before inserting in the target text field.

Wanna see this in action? Have a look at this short demo video:

I hope that the future will bring us more language support for Windows 10 dictation (Portuguese, Russian and Arabic, please!) and that other providers (like Google, if you're listening, and Apple, which never listens to anyone anymore except to spy on them with Siri) will expand the speech-to-text features offered, particularly to include sound-linked vocabulary training and better adaptation to individual users' speech. Five years ago when I began to investigate alternatives for non-DNS languages, I expected we would have more by now, and we do, but professional needs require all providers to raise their game.

Addendum: Someone asked me if Windows Speech Recognition is a cloud resource or a locally installed one which will work without an Internet connection. It's definitely the latter. So if you have lousy bandwidth or find yourself disconnected from the Internet, you can still use speech-to-text features.

And more: I use a lot of spoken commands for keyboard shortcuts when I work, so I did a little research and testing. It seems that Windows 10 speech recognition gives full access to an application's keyboard shortcuts via voice. So in memoQ, for example, I can dictate the insertion of tags, items from the Translation Results pane and a lot more. Watch out, Nuance. Windows 10 is going to kick your Dragon's scaly butt!

Aug 26, 2019

Exporting compatible XLIFF (XLF) bilingual files from memoQ

Here we go again. Although memoQ is the undisputed leader for compatibility and interoperability among translation environment tools, users still encounter problems exchanging files, particularly XLIFF of some sort, with users of other tools. This is not because of any actual difficulty producing compatible XLIFF files, but rather a matter of deficient tool training and the failure to date by memoQ product designers to make the ease of interoperability a little more obvious. Some other tools, like recent versions of SDL Trados Studio, come pre-configured on installation to recognize the proprietary file extensions for memoQ's flavor of XLIFF ("MQXLIFF") and renamed ZIP packages (MQXLZ) containing XLIFF files, but others (or versions of SDL Trados Studio from many years ago) need to be configured to recognize those extensions, or someone simply has to change the MQXLIFF file extension to an extension that will be recognized by any tool: *.xliff or *.xlf are the choices.

The two-step solution is shown here:

On the Documents ribbon in memoQ, click on the tiny arrow under the Export icon and choose the option to export a bilingual file. There is some blue text which, if clicked, will allow a compatible XLIFF file to be exported, albeit with the MQXLIFF extension that some other programs might not recognize.

When the Export button in the dialog (marked 1, above) is clicked, the Save As dialog (marked 2, above) appears, simply change the file extension (the part after the period) to "xlf", for example. Then any program that reads XLIFF files can work with the file you export from memoQ. Despite the change of extension, memoQ will still recognize the file it produced, so it is possible to re-import it, for example if another person has made corrections to the XLIFF file that you want to use to update your translation or reference resources.

In some much older versions of memoQ, it does not work to change the extension in the export dialog; this has to be done directly to the exported file in whatever folder you save it in.

Of course, all of this will be rather difficult if you are one of those users who has not fixed the awful Microsoft Windows default to hide the extensions of known file types. Fixing that particular stupidity requires slightly different measures in different versions of Windows, but in Windows 10 you can do that on the View ribbon of Windows Explorer by marking the choice to show file name extensions:

Jun 17, 2018

Ferramentas de Tradução - CAT Tools Day at Universidade Nova de Lisboa

The Faculty of Sciences and Humanities held its first "CAT Tools Day" on June 16, 2018 with a diverse program intended to provide a lusophone overview of current best practices in the technologies to support professional translation work. The event offered standard presentation and demonstrations in a university auditorium with parallel software introduction workshops for groups of up to 18 persons in an instructional computer lab in another building.

The day began with morning sessions covering SDL Trados Studio and various aspects of speech recognition.

Dr. Helena Moniz explains aspects of speech analysis.

I found the presentation by Dr. Helena Moniz from the University of Lisbon faculty to be particularly interesting for its discussion of the many different voice models and how these are applied to speech recognition and text-to-speech synthesis. David Hardisty of FCSH at Universidade Nova also gave a good overview of the state of speech recognition for practical translation work, including his unobtrusive methods for utilizing machine pseudo-translation capabilities in dictated translations.

Parallel introductory workshops for software tools included memoQ 8, SDL Trados Studio 2017 and ABBYY FineReader - two sessions for each.

Attendees learned about ABBYY FineReader, SDL Trados Studio and memoQ in the translation computer lab

The ABBYY FineReader session I attended gave a good overview in Portuguese of basics and good practice, including a discussion of how to avoid common mistakes when converting scanned documents in a number of languages.

The afternoon featured several short, practical presentations by students, discussions by me regarding the upcoming integrated voice input solution for memoQ and the preparation of PDF files for reference, translation, print deadline emergencies and customer relations.

Rúben Mata discusses Discord

The final session of the day was a "tools clinic" - an open Q&A about any aspect of translation technology and workflow challenges. This was a good opportunity to reinforce and elaborate on the many useful concepts and practical approaches shown throughout the day and to share ideas on how to adapt and thrive as a professional in the language services sector today.

Hosts David Hardisty and Marco Neves of FCSH plan to make this an annual event to exchange knowledge on technology and best practices in translation and editing work in discussions between practicing professionals and academics in the lusophone community. So watch for announcements of the next event in 2019!

Some of the topics of this year's conference will be explored in greater depth in three 25-hour courses offered in Portuguese and English this summer at Universidade Nova in Lisbon. On July 9th there will be a thorough course on memoQ Basics and workflows, followed by a Best Practices course on July 19th, covering memoQ and many other aspects of professional work. On September 3rd the university will offer a course on project management skills for language services, including the memoQ Server, project management business tools, file preparation and more. It is apparently also possible to get inexpensive housing at the university to attend these courses, which is quite a good thing given the rapidly rising cost of accommodation in Lisbon. Details on the housing option will be posted on this blog when I can find them.

Nov 15, 2017

memoQ Cloud subscriptions and credit card tribulations

Kilgray's memoQ Cloud service is a very convenient platform for learning and testing the latest features of the memoQ server and operating a server for small teams without the hassles of maintaining the server infrastructure and software in-house. For those considering a dedicated server for their company or institution, it offers an excellent opportunity for pilot testing at low or no cost depending on how long you use it. The first month is free; after that the monthly charges (to a credit card) currently start at EUR 160 or USD 175 for an account with one project manager license and 5 web access licenses (to which anyone with a licensed copy of memoQ can connect, without the need to include Translator Pro licenses in the subscription - these are needed only if users without a license will be connecting with the desktop editions of memoQ).

I use the memoQ Cloud server occasionally for shared projects, because it allows me to configure files and resources (of all kinds) more conveniently than file-swapping by e-mail or Dropbox folders and provide better support to my team members. For €160 per month in the months I need it I am on equal footing with any large agency with a memoQ server for the team sizes I want to work with. And I can even share the translation memory resources with colleagues who use SDL Trados Studio using the free Kilgray plug-in for that platform which enables access to any memoQ server online (with an access account created).

The only disadvantage of this service for me is Kilgray's annoying tendency to force upgrades much too soon in the release cycle. This won't matter at all to someone testing the memoQ Cloud server to evaluate the latest release; in fact, this is helpful to avoid the occasional server setup difficulties with new versions on which the paint has not yet dried so you can focus on evaluating features and general stability. But if you are in the middle of a big project, this can be a nuisance. More often now I assume, since Kilgray's current strategy involves more frequent minor version releases. If there is a compatibility problem between the latest release and a team member's memoQ software version, and that person isn't current with the annual maintenance and support plan (which includes free upgrades), they will be stranded for access from their memoQ desktop application until the missed annual fees are paid up.

But until today there was another mysterious hassle that I finally got sorted out. When I first started using memoQ Cloud, I paid the subscription with a US credit card from an old credit union account there. No problems. However, when I incorporated my business in my current country of residence and tried to use a card from my business account there, it never worked, and the explanation screen was displayed for only a brief time, with the text completely garbled due to an incorrect codepage specification for the web page. The first time this happened, I assumed the problem was Kilgray's, and after some back-and-forth with support, the company kindly made an inconvenient exception to their "credit card only" rule and sent me a normal invoice to pay by bank transfer. This isn't a usual thing as I have learned from some frustrated potential corporate customers who don't want to pay by credit card, so I am grateful that something was worked out in that case so I could get on with some urgent teamwork.

After a break of six months or so, the need for a cloud server arose again, and again I had the same trouble with my business credit card. After grumbling briefly to a friend at Kilgray who had sorted the mess out before, I decided to call my bank, because in the meantime my reading skills had improved enough that I was fairly sure that the trouble had nothing to do with Kilgray. Indeed.

The credit card verification and approval service used by Kilgray for web payment is 3-D Secure. In the case of my bank, this service is not available for credit card payments unless its activation is specifically requested. Such a thing never occurred to me, because I use the same card with Amazon and others to order dictionaries and other work materials. As the technician at my bank's help desk explained, there are several different payment approval systems for web transactions with a credit card, and it's merely a coincidence that the others I have dealt with haven't used 3-D Secure. He activated the service immediately (no cost), and five minutes later my memoQ Cloud subscription was renewed with the means of payment I preferred to use.

So it was in fact not Kilgray's problem at all, but it's probably a good idea for their support staff to take note of this scenario, because I am surely not the only one who got tripped up by 3-D Secure not being activated for my card. I am sort of embarrassed that I didn't think of this possibility earlier, but I don't do a lot of shopping online, and for minor stuff if one card fails for reasons unknown, I just shrug and use another. In fact, I think the same problem may have occurred with an airline ticket last spring, but I never associated that with my earlier troubles.

So now I'm up and running with the server version 8.2.5 on the memoQ Cloud, hoping I can finish my training project on that version before the impending release of memoQ 8.3 and the possibility of an upgrade before I get the work done. Tick, tick, tick....

Jun 5, 2017

Optimizing term properties for many entries in a memoQ termbase

Terminology: On my wishlist: an easier way to deal with termbases imported into MemoQ in Studio packages. Especially annoying: the habit of many Studio users capitalizing termbase entries & thus torpedoing recognition. It would seem the default setting in MultiTerm is fuzzy matching.

memoQ is noted for its compatibility with SDL Trados Studio files and projects; with the latest release of memoQ (version 8.1) there is apparently full compatibility with tracked changes in SDLXLIFF files and with Studio's translation quality assurance. However, there are apparently a few little points remaining to satisfy some.

The opening comment is from a colleague who seems to experience less than optional matching for terminology which is imported as part of a memoQ project using an SDL Trados Studio package (SDLPPX). The solution to this person's frustration is fairly simple, however, and it is useful in many other cases where the properties of terms in a memoQ glossary are not well optimized.

Many people are unaware of the fact that it is possible to change any of the term properties for a large number of terms at once. To do this, simply open the memoQ termbase for editing and select the terms to change. Multiple selections can be made by holding down the Shift key and clicking on the desired range of rows or by using the control key to mark individual selections. Then simply set the desired property (such as fuzzy matching) and the change will be applied to all of the selected terms.

About four years when fuzzy term matching was introduced by Kilgray I made a short video about this. The memoQ interface is a little different since then but the procedure works just as well today:

Mar 29, 2017

Get started April 10th with extension development for SDL Trados 2017

On April 10th, 2017 at 4 pm (UTC) there will be a free webinar for those interested in the basics of development for SDL Trados Studio 2017 using SDL's application programming interfaces and software development kits.

Romulus Crisan, an SDL Language Platform Evangelist Developer, will guide you though:

configuring the development environment
new APIs introduced with the Studio 2017 release
upgrading current plug-ins to support Studio 2017
building a simple editor filtering plug-in

Heads-up to Kilgray developers: maybe here you can figure out how to fix problems with that cool plug-in that allows SDL Trados Studio 2014 and 2015 users to read and write memoQ Server TMs so that it will also work with the latest Clujed Maidenhead Madness.

You can register for the webinar here.

Mar 23, 2017

First month with SDL Trados 2017

A month ago, when I announced the Great Leap Forward from my rather neglected SDL Trados 2014 license to the latest, presumably greatest version, SDL Trados 2017, after seeing how wet the largely untested release of memoQ 8 (aka Adriatic) has proved to be, there was some surprise, as well as smiles and frowns from various quarters. It's been a busy month, and I am still testing options for effective workflow migration and exchange (useful in any case given how often memoQ users work together with those who prefer SDL tools) as well as discussing the good and bad experiences of friends, colleagues and clients who use SDL Trados Studio 2017.

As can be expected, this product has more than a bit of a bleeding edge character, though on the whole it does seem to be a little more stable and less buggy than memoQ Adriatic so far, with fewer what the Hell were they smoking moments. However....

I was a little concerned at the report from a colleague in Lisbon that the integration of the plug-in for SDL Trados Studio access to Kilgray Language Terminal amd memoQ Server translation memories doesn't work with SDL Trados 2017 after functioning so well in SDL Trados 2014 and 2015. Despite the stupid inter-company politics between SDL and Kilgray, which hindered the approval of the plug-in so that a warning dialog appeared each time it was loaded in SDL Trados Studio (bad form by the boys in Maidenhead), it was a great tool for users of SDL Trados Studio and memoQ to share TMs in small team projects. I was very happy with how it worked with SDL Trados Studio 2014, and I am very disappointed to see that API changes in the latest version have bunged things up so that Kilgray will have more work to re-enable this useful means of collaboration. I hope that SDL will see fit to be less petty and more cooperative with the upcoming "fixed" plug-in! It is in their interest to do so, as this makes it easier for SDL Trados users to stick to their favorite tool while working on jobs for or with those who prefer memoQ as their resource. Better work ergonomics for everyone and no BS with CAT wars.

I was pleased to see that SDL Trados Studio has added AutoCorrect facilities recently. And they seem to work reasonably well in English and mostly in German, though there was a strange quirk which hamstrung the "correct as you type" feature. That setting took a while to "stick" somehow when I tested it first with German. It was fine for Portuguese too. However, Ukrainian and Arab colleagues can't get it to work for some reason. I did not believe this at first until a colleague in Egypt showed me live via shared screens in Skype how the autocorrection simply failed to activate. Perhaps this is an issue with languages that don't use the Roman alphabet, so I suppose colleagues in Russia, Serbia, Japan and elsewhere may be tearing some hair out over this one. It doesn't affect me directly, but it looks like a pretty serious bug that ought to be addressed ASAP.

SDL generally kicks some butt with regex facilities in SDL Trados Studio; customer service guru Paul Filkin has written a lot about these features on his Multifarious blog, and most advanced users of the platform make heavy use of regular expressions in filters and QA rules. For a long time, memoQ users could only look on in envy at all the excellent possibilities before Kilgray belatedly added more regex options to its work environment. However, there are a few raw rubs remaining.

My Arabic translator friend pinged me recently to ask if I was aware of the "regex trouble" in the latest Studio version. He made heavy use of these features for Arabic and English work in some rather amazing, creative and inspiring ways (I had not imagined) in earlier versions of SDL Trados Studio, and some of these features are rather broken at present in SDL Trados 2017. He gave me a very useful tutorial (which I had planned to beg him for anyway soon) in the use of regex in SDL Trados Studio for basic filtering, advanced filtering and QA checks. Overall I was very impressed with the possibilities, but the failure of some regular expressions which worked well in the advanced filters to work at all in the basic filter or in QA rulesets was very disturbing. We argued a little about what the basis of the problem could be in the software programming, but it is a major problem which limits the functionality of SDL's latest software severely and should cause advanced users and LSPs to wait and watch for the fix before upgrading to the latest version. The persistence of such a major flaw in such an important area as quality assurance some 6 months after release is frankly shocking. I hope this will be addressed very soon so that I can migrate and upgrade some of me favorite QA routines from memoQ.

Last but not least is an irritating bug in an auxiliary feature for what has always been one of my favorite terminology tools, MultiTerm. It was the first Trados product many years ago, and despite many quirks over the decades, it remains one of the best. Face it: the memoQ terminology model is OK for most practical uses, but for maintaining high quality corporate terminologies tracking many important attributes it is hopeless garbage. Most other CAT tool terminology databases and glossaries are far worse. MultiTerm sets the standard today still for affordable, flexible, powerful terminology management. For 17 years I have used this excellent platform for my best terminologies for my best clients and delighted in its output management options (even when they can be a pain in the butt to configure properly).

When I want to access my high value MultiTerm resources while translating in memoQ or working in web pages or MS Word, I use the convenient MultiTerm widget to access the data. However, I am very disappointed to find that recent versions do not display the attributes for terms when the widget is used for lookup. Damn. That makes the results just as annoying as the lobotomized MultiTerm/TBX imports into memoQ. I really hope that SDL fixes this flaw ASAP and remains on top of the terminology game with MultiTerm and its lookup tools as a valuable resource even for translators who hate Trados Studio and won't use it.

Overall I am seeing a lot of nice things in SDL Trados Studio 2017, and I would say it is probably more mature and stable than memoQ 8 at this point. But it really is just a late-stage beta release, and more fixes are needed before I can trust it for routine production work. We are all better off for now to stick with the prior versions of both SDL Trados Studio and memoQ.

Feb 27, 2017

Planning special rules for structured "expressions" and multi-word abbreviations

Translators and editors often deal with what I'll call "structured expressions" or "patterned data" in many forms, which include:

long and short dates (2016-01-13; 1/13/16; 13.01.2016; January 13, 2016; 13th January 2016; etc.
time expressions (14:35; 2:35 pm; 2:35 PM; 2:35 p.m.; etc.
currency expressions (EUR 2.3 million; € 2,300,000; €2.3m; etc.)
legal references (Section 14a paragraph 3 line 2; section 14a (3) line 2; etc.
bibliographical references for chapters, pages, margin notes, etc.
and much more.

There is also a wealth of abbreviations for multiple word expressions in some categories of text; favorites in German include:

in Verbindung mit (variously written as i.V.m., i. V. m., iVm or some typoed hybrid of the aforementioned with spaces and periods included or forgotten depending on the authors' preferences and degree of care)
im Sinne des (i.S.d., i. S. d., iSd, etc.)

These can be devilishly hard to check efficiently for consistency or other quality factors in a long text, and for the translation, there is often no single "right" way to format the target text equivalents, with many individual preferences to be found with translation buyers. Even with a good style guide (all too rare anyway), these issues can be challenging time-wasters.

Translation assistance tools such as Apsic Xbench, SDL Trados Studio and others, even memoQ, have various approaches to making life easier for a translator or editor faced with these challenges. Unfortunately for most people, these approaches usually involve the use of "regular expressions" or "regex" as nerds affectionately call it. Not an easy thing even for many hardcore techies!

On past occasions when I have written about the use of regex in translation tools, I have usually stated clearly that the best approach for the best, most reliable results is to have the regex "rules" for handling the text developed by a knowledgeable third party. The experts who deal with this stuff routinely can often reduce a task that would take a semi-skilled person like myself hours or even days to the time for a coffee break, and even if a task takes a while and runs up a bit of a bill, it's much more likely to be done right the first or second time.

But... there's a catch usually. Most of these regex fireaters are not skilled in mind reading, many are not translators, and even those familiar with translation challenges might not be familiar with your working languages or your particular subject areas and their possibly unique challenges. So effective communication is really, really important (it always is, of course, but here even more so if you are dealing with a verbally challenged, monolingual math freak who might be your local expert for regex).

Even for areas I know reasonably well and languages I more or less master, I am often frustrated by help requests from colleagues and clients who need special rulesets developed for a client's preferences for date and currency information, because the request is not clear in its scope and detail, and many important cases are left out, so the end result is not fully satisfactory.

Over the years and with a lot of back and forth (sometimes inside my own head with yours truly as my nightmare of a "client"), I have developed a system of simple documentation for planning and testing rules to help translate and quality check patterned information or multi-word abbreviations. This system provides an easy structure for non-techies (or even hardcore techies) to organize the help request for most efficient handling. Here is an example of part of such a planning sheet for a recent project involving Arabic:

When the time comes to test, just copy the source text column into a separate file, add whatever variations you want to the examples to test your accomodation of typos, etc. and then load that file as a "translation text" for testing in your working environment. If you have the same information for another, overlapping language pair, such as German and English, it is easy to couple that to make a ruleset which maps multiple source languages to a target language. An example of such a result is a memoQ auto-translation ruleset for mapping long dates and month-plus-day dates from German, English, French and Spanish into Portuguese which can be obtained here.

This simple, tabular approach to data collection to plan regular expression rules has made me a lot more efficient at such tasks and faciulitated the re-use of data to make new rulesets for clients and colleagues (or myself) as needs arise. The liberal commenting of examples can be very helpful; information to include which could affect rule structure might involve capitalization, location in a sentence, variations or differences in particular contexts, etc.

For my own work, rulesets include a series for dates, currency and legal reference formats from German to English for generic and client-specific use for US and UK English. With the help of these tabular planning sheets, I can adapt any of these quickly for most other languages.

For tracking the development of rules and their improvement history I have another set of templates which I use for systematic planning and identification of areas to improve. That will be discussed on another occasion.

Dec 15, 2016

Validating Roman numerals in translation QA

The issue of Roman numerals in my translation work has been at the back of my mind for a few years now, but the pain level had not been such that I got around to dealing with it. It comes up time and again in legal translation work: references to the "X. Senat" or the like which mess up segmentation (and require a bit of regex to do a new segmentation rule); references to "Art. VII" of some law (I need to catch the typos like "VIII"); source text errors like "VIIII"; and of course dates like MCMXXIV, etc. and century references.

For simple matters I used regex which would capture and reproduce "Roman numerals", but erroneous data using the right letters would also be accepted:

[MDCLXVI]+

That is, of course, rather useless for QA which checks the correctness of the expression in the source text. So with a bit of thought I came up with:

Without the word border syntax ("\b"), non-standard expressions like "VIIII" might appear to be validated in the interface of memoQ, for example, because the whole express would be marked green in the source text, and one might not notice that it was resolved into "VIII" and "I".

These expressions can be used in various ways in any CAT tool that supports regular expressions, such as SDL Trados Studio or memoQ.

If you want this typing aid and QA tool as a memoQ autotranslatable (along with a little demo data file), you can get it here.

Search me!