Jun 24, 2017

The multilingual toolkit for getting a date in Swahili

Some time ago, I was asked by IAPTI to provide some technical support for a developing effort to assist professional translators in various African regions. The flame of the Translators Without Borders center established a few years ago in Kenya has apparently sputtered out due to an incredibly silly anti-business model which undermined local professionals, so various initiatives were launched to help translators in the region grow stronger together and improve their professional practice.

Since memoQ is perhaps the best tool for managing the challenges of expert translation under the widest range of languages and conditions, I considered how I might contribute to solving some of these and reduce the frustrations of language barriers in Africa. I thought of all the business travelers there, as well as the NGOs and representatives of governments around the world who want a piece of what's there. All alone, strangers in a strange land, sweltering in some Nairobi hotel, how can these people even get a date in Swahili?

Once again, it's Kilgray to the rescue... with memoQ's auto-translation rules!

Using the various methods I have developed and published for planning and specifying auto-translation rules, I assembled an expert team for translation in Swahili, Arabic, Hebrew, English, German, Portuguese, Spanish, French, Russian, Hungarian, Dutch, Finnish, Polish and Greek to draft the rules for getting long dates in Swahili.

And using the Cretinously Uncomplicated Process for Identifying Dates (CUPID), these results can be transmogrified quickly to support lonely translators working from German, French and English into Arabic or from German, French, English and Spanish into Portuguese, for example, or in any combination of the languages applied for Swahili dates or others as needed.

With memoQ and regex-based auto-translation, you'll never be stuck for a quality-controlled date in any language!

Germany needs Porsches! And Microsoft has the Final Solution....

I hear that Germany is suffering from a shortage of Porsches. Odd, given that the cars are made there and should be readily available, but it's true, because my friend who lives there told me. He owns a large, successful LSP (Linguistic Sausage Production) company, and to celebrate its rise in revenues, he decided to get everyone on the sales staff a new Porsche as a company car. The problem is that he can't find any for €5000 euros.

So he was left with no choice but to cut overhead using the latest technologies. Microsoft to the rescue! With Microsoft Dictate, his crew of  intern sausage technologists now speak customer texts into high-quality microphones attached to their Windows 10 service stations, and these are translated instantly into sixty target languages. As part of the company's ISO 9001-certified process, the translated texts are then sent for review to experts who actually speak and perhaps even read the respective languages before the final, perfected result is returned to the customer. This Linguistic Inspection and Accurate Revision process is what distinguishes the value delivered by Globelinguatrans GmbHaha from the TEPid offerings of freelance "translators" who won't get with the program.

But his true process engineering genius is revealed in Stage Two: the Final Acquisition and Revision Technology Solution. There the fallible human element has been eliminated for tighter quality control: texts are extracted automatically from the attached documents in client e-mails or transferred by wireless network from the Automated Scanning Service department, where they are then read aloud by the latest text-to-speech solutions, captured by microphone and then rendered in the desired target language. Where customers require multiple languages, a circle of microphones is placed around the speaker, with each microphone attached to an independent, dedicated processing computer for the target language. Eliminating the error-prone human speakers prevents contamination of the text by ums, ahs and unedited interruptions by mobile phone calls from friends and lovers, so the downstream review processes are no longer needed and the text can be transferred electronically to the payment portal, with customer notification ensuing automatically via data extracted from the original e-mail.

Major buyers at leading corporations have expressed excitement over this innovative, 24/7 solution for globalized business and its potential for cost savings and quality improvements, and there are predictions that further applications of the Goldberg Principle will continue to disrupt and advance critical communications processes worldwide.

Articles have appeared in The Guardian, The Huffington Post, The Wall Street Journal, Forbes and other media extolling the potential and benefits of the LIAR process and FARTS. And the best part? With all that free publicity, my friend no longer needs his sales staff, so they are being laid off and he has upgraded his purchase plans to a Maserati.

The other sides of Iceni in Translation

The integration of the online TransPDF service from Iceni in memoQ 8.1 has raised the profile of an interesting company whose product, the Infix PDF Editorhas been reviewed before on this blog. TransPDF is a free service which extracts text content from PDF files, converts it to XLIFF for translation in common translation environments, and then re-integrates the target text from the translated XLIFF to create a PDF file in the target language.

This is a nice thing, though its applicability to my personal work is rather limited, as not many of my clients would be enthusiastic if I were to send PDF files as my translation results. Sometimes that fits, sometimes not. And of course, some have raised the question of whether using this online service is compatible with some non-disclosure restrictions.

I think it's a good thing that Kilgray has provided this integration, and I hope others follow suit, but for the cases where TransPDF doesn't meet the requirements of the job, it is useful to remember Iceni's other options for preparing text for translation.

Translatable XML or marked-up text export
As long as I can remember, the Infix PDF Editor has offered the option to export text on your local computer (avoiding potential non-disclosure agreement violations) so that it can be translated and then re-imported later to make a PDF in the target language. Only the location of this option in the menus has changed: the menu choices for the current version 7 are shown below.

This solution suffers from the same problem as the TransPDF service: not everyone will be happy with the translation in PDF, as this complicates editing a little. However, I find the XML extract very useful to put the content of PDF files into a LiveDocs corpus for reference or term extraction. The fact that Infix also ignores password protection on PDFs is also helpful sometimes.

"Article" export
The Article Tool of  the Iceni Infix PDF Editor enables various text blocks on different pages of a PDF file to be marked, linked and extracted in various translatable formats such as RTF or HTML. The quality of the results varies according to the format.

Once "articles" are defined, they are exported via the command in the File menu:

The RTF export has some problems, as this view in Microsoft Word with the format characters made visible reveals:

However, the Simple HTML export opened in Microsoft Word shows no such troubles (and can be saved in RTF, DOCX or other formats):

Use of the article export feature requires a license for the Infix PDF editor, unlike the XML or marked-up text exports for translation. In demo mode, random characters are replaced by an "X" so that one can see how the function works but not receive any unjust enrichment from it. However, this feature has significant value for the work of translators and is well worth an investment, as the results are typically better than using OCR software on a "live" (text-accessible) PDF file.

But wait... there's more!
Version 7 also has an OCR feature:

I tested it briefly on some scanned Portuguese Help Wanted ads that I'll probably use for a corpus linguistics lesson this summer; the results didn't look too awful all considered. This feature is worth a closer look as time permits, though it is unlikely to replace ABBYY FineReader as my tool of choice for "dead" PDFs.