Jun 24, 2022

The Invisible Hand of Social Media

 


We've probably all been there by now. The evolution of social media platforms has seen the "rise of the machines", with artificial (non-)intelligence programmed by unintelligent humans monitoring and managing our actions, sometimes with little or no possibilities of pushback on our part or processes that waste inordinate amounts of time to obtain correction. We see this on Facebook, LinkedIn and other media which are too often also a major venue for business communication and technical support for professional tools we use. In the case involved with the screenshots above, my "wicked" comment was made with regard to an economic theory promulgated by the long-dead white Englishman, Adam Smith, and I was given the opportunity to protest to an external board after the fourth-world budget help confirmed that I was indeed a bad person promoting terrorism. I wrote:
There was a discussion about the abuses of capitalism and the flawed thinking of Adam Smith's "invisible hand", which he thought guided markets to do the right thing without control. As we all know, it is necessary for societies to exercise some control to protect health and safety, the environment etc. Blind belief in the invisible hand too often leads to tragedy -- in a sense, this invisible hand concept points the middle finger at us humans too often. So I suggested a metaphorical amputation of a metaphorical middle finger on a metaphorical invisible hand, meaning that we need legislation, etc. to protect against abuses in an unrestricted market. No real violence against any living beings whatsoever was suggested. The AI used by Facebook is functionally idiotic.

Consider also that if the hand is invisible the metaphorical blood must be too, so amputation should shock nobody, as any gore will also be invisible ;-)

Now all that is just a bit of amusement over coffee in the morning. But other cases are more serious, like the automated banning of a Ukrainian software developer for two months on LinkedIn because Putinist trolls objected to him trying to describe the reality of serving his customers while coffee breaks are accompanied by missile strikes and business-as-usual genocide. It seems that a certain volume of complaints can trigger a ban even when no specific violation can be identified. Artificial intelligence indeed.

Entirely too much trust is granted to automation, almost a reflex, even when it obviously contradicts logic and common sense. In the translation sector, I encounter time and again the idea that language services work should be cheaper if it involves text pre-proccessed by machine translation such as DeepL, even when it can be demonstrated that achieving the desired level of quality takes longer than if the text were translated by human effort only. So time isn't money for these people, I guess. The real issue is slavelancing.

Some postulate that most people need to believe in a Higher Power of some sort and take from it the direction and meaning of their lives. I used to dispute that, but seeing now how so many educated people in the business world have replaced Nin-girsu, Shiva, Allah, Nossa Senhora da Fatima and Mighty Cthulu with the new God of MTness, I suspect I may have been wrong.

As with most religions, it's our people and their welfare who are the true victims of this automation religion, while the priests argue that the real problem is that we haven't sacrificed enough to that invisible middle finger the technologists wave in our faces.

But I ask, what's the harm of paying a few more cents if you must if it leads at last to a lot more sense and better understanding for us all?

Jun 17, 2022

memoQ Inside Out: Templates for Translators

In the summer of 2021, I was coaching a group of project managers and translators at a Portuguese service company, helping them to develop processes to overcome some rather complicated filtering and configuration challenges for recurring project types. It was clear to me that some of their difficulties could be overcome with the use of templates, but I had only recently begun to use these productively myself, and my attempts to communicate the subject matter overwhelmed the group for the most part, and examining the sample templates provided with memoQ installation simply made matters worse.

After several frustrating tutorial sessions and the failed acceptance of a template that I had developed which was tailored to a rather long wish list of automation that came out of our discussions, I decided that the only way to make the value of templates clear to this group of professionals was to wipe the slate clean, forget about all the myriad "wishes" and build a few simple templates which did just a few simple things. Starting from a new configuration with nothing at all. Surprisingly, less was indeed more, and the frustrated people began to "get it".

At almost the same time, my friend and colleague Marek Pawelec, a gifted teacher whom I often refer to quite objectively as "a consultant's consultant", mentioned that he was thinking of writing a book on memoQ templates, because he found that most people were unable to avoid the problems in the example templates provided with memoQ installation, nor were they able to work out most difficulties encountered when making their own templates. I could understand this very well, because the user interface in the configuration dialog for a template is not a stellar example of clarity, and it took me years to make proper sense of much of it. Disappointing, really, because I had been part of the chorus begging for something like templates for years, but when they were delivered, little about them made obvious sense to a dummy like me.

He sent me a chapter he had drafted, where I noted that he had adopted the same reductionist approach to getting started. A template with just a pick list or two for meta data to avoid the problem I've had for years of accidentally using different designations for the same clients, subjects, domains, etc. He had come to the same conclusion independently that the best approach to helping people use templates effectively is to start with one or two simple things they do all the time but often mess up.

That interesting draft chapter took time to evolve into a full-fledged guide of nearly 70 pages, with many practical, relatable examples of the kinds of challenges that individual translators (and many other service providers) often face in configuring translation projects. The topics cover the full range of options, from very simple tasks to extremely complex workflows involving pre-import scripts for preparing translation data and post-processing to recreate the original data formats. At every stage he offers clear examples and guidance on how to make things work in cases I have seen time and again in more than two decades of commercial translation work.

I had the pleasure to edit two drafts of this work as it neared completion. And pleasure really is the right word to use here. Marek has a very different explanatory style than mine, but one which I prefer for my own education. He manages very well the deep dive into messy details without drowning the reader in jargon and other unhelpful complexity. His guide gives valuable suggestions and information for every level of expertise. Much of the content can be understood and applied by unsophisticated new users of memoQ, but some of the details on content connectors and scripting can light a chandelier full of bulbs in the heads of alleged experts like myself.

Templates for Translators is an essential reference work for all memoQ users in my opinion, the sort of thing which ought to have been provided seven years or so ago when templates were introduced. Instead we got some imperfect examples which too often - especially in the hands of under-trained PMs at translation agencies - result in unworkable projects with 50+ translation memories and term bases grinding performance to a halt or a lot of mysterious and unwanted automation that does stupid shit like write unfinished and defective translations directly into one's master TM.

In addition to explaining clearly how to create your own helpful project shortcuts and automation from scratch, Marek included a great  chapter in which he describes in detail the templates provided for local projects, what works in them and what doesn't, and how to fix any issues so things work right for you. Even if you are a server user working primarily with online projects, there is a wealth of material in this version of the templates guide to help you work more effectively with templates for online projects. A second edition is planned for later this year, which will cover the additional features of templates for memoQ Server projects, but the real problems of most people working with those are covered in the basics presented in the "translators" edition, not in a lack of guidance on the many extra event "triggers" for online projects or other details. So if you are a server user, don't wait for the later edition, get this guide now, read every damned page and try to contain your exuberance as you finally understand a lot of stuff that has been confusing the Hell out of most of us for a long time. Then when the "server edition" of the guide is published, you'll be better prepared to absorb the increment of information it offers.

This book is now a valued part of my teaching "arsenal", and I recommend it without reservation to every memoQ user who aspires to work independently and create more effective processes for the special needs of various clients and subject matter. If you are a consultant or trainer at a serious level, it could well be considered malpractice to train without some of the information you'll find in Templates for Translators. But that's just what I see too often: discussions of templates glibly use the few defective examples installed with memoQ with little consideration given to how many translators should work in the real world with real, common client projects. This book is a welcome aid to move beyond all that and improve our satisfaction with the routine of translation in memoQ.

So for less than the cost of half an hour of consulting, the €30 invested here will save nearly anyone a large multiple of that and continue to pay dividends for a very long time, even if you understand and apply only 10% of the material presented. I charge far, far more to teach people less than that.

memoQ Inside Out: Templates for Translators is available for purchase at https://payhip.com/b/agrxM

May 30, 2022

Cleaning up language variants in memoQ term bases

While the idea of using sublanguage variants, such as UK, US or Canadian versions of English, sounds nice in principle, in practice these often create headaches for users of translation environments such as memoQ, particularly when exchanging glossaries with others but also when viewing and editing the data in the built-in editors. Many times I have heard colleagues and clients express a wish to "go back" and work only with generic variants of a language in order to simplify their management of terminology data. In the video below, I share one method to do so.

At 3:08 in the video, I share a little "aside" about how the exported term data can be edited to mark a term as forbidden (for instance, if its use is not desired by the translation buyer). Other changes to the information are also possible at this stage, such as the addition of context and use information for example. Other data fields from the term base can also be included in the export for cleanup if these play an important role in your memoQ term bases.

For years, users have requested an editing feature in memoQ that would make "unifying" language variants possible, but as you can see in this video tutorial, this possibility already exists and is neither difficult nor time-consuming to implement. 

If you do not wish to create a new term base to import the cleaned-up data (as shown in the video) but would rather bring it in to the same term base, it is important to configure the settings for your import correctly so that the original data will be overwritten and you won't end up with messy duplication of information. This is achieved with the following setting marked in red:


However, it should be noted that the term base will still have all the now-unused language variants, albeit with no entries for them. These can be removed by unchecking the boxes for the respective language variants in the term base's Properties dialog.

Speaking of the Properties dialog, some may have noted that in recent versions of memoQ there is an automated option for cleaning up those unwanted language variants:


Why bother with the XSLX route then? Well, depending on what version of memoQ you use, you may not have that command available in the dialog. But more importantly, I find that when merging data from various language variants I often want to do additional editing of the term information, and that really isn't possible when merging language variants in the Properties dialog. Doing the edits in Microsoft Excel gives you an overview of the data and the option to make whatever adjustments may be needed. In Excel you can also make further changes, such as altering the match properties for better hit results or more accurate quality assurance.

May 28, 2022

Filtering formatted text in Microsoft Office files

 Recently, I shared an approach to selecting text in a Microsoft Word file with editing restricted to certain paragraphs. This feature of Microsoft Word is, alas, not supported by any translation tool filters of which I am aware, so to import only the text designated for editing it is necessary to go inside the DOCX file (which is just a ZIP archive with the extension changed) and use the XML file which contains the document text with all its format markers.

This approach is generally valid for all formats applied to Microsoft Office files since Office 2007, such as DOCX from Word or PPTX from PowerPoint. I have prepared a video to show how the process of extracting the content and importing it for translation can work:

 After translation, the relevant XML file is exported and the original XML is replaced with the translated file inside the archive. If the DOCX or PPTX file was unpacked to get at the XML, the folder structure can then be re-zipped and the extension changed to its original form to create the deliverable translated file.

What I do not show in the video is that the content can also be extracted by other means, such as convenient memoQ project templates using filters with masks to extract directly using various ZIP filter options. But the lower tech approach shown in the video is one that should be accessible to any professional with access to modern translation environment tools which permit filter customization with regular expressions.

Once a filter has been created for a particular format such as red text, adapting it to extract only green highlighted text or text in italics or some other format takes less than a minute in an editor. Different filters are necessary for the same formats in DOCX and PPTX, because unfortunately Microsoft's markup for yellow highlighting, for example, differs between Word and PowerPoint in the versions I tested.

Although this is a bit of a nerdy hack, it's probably easier for most people than various macro solutions to hide and unhide text. And it takes far less time and is more accurate than copying text to another file.

In cases where it is important to see the original context of the text being translated, this can be done, for example, using memoQ's PDF Preview Tool, a viewer available in recent versions which will track the imported text in a PDF made from the original file. This can be done using the PDF Save options available in Microsoft applications.


May 5, 2022

Understanding and mastering tags... with memoQ!

Everything you need to know... in 36 pages!

Following up on the success of his excellent guide to machine translation functions in memoQ, Marek Pawelec (Twitter: @wasaty) has now published his definitive guide to tag mastery in that translation environment. In a mere 36 pages of clearly written, engaging text, he has distilled more than a decade of personal expertise and exchanges with other top professionals in language services technology into simple recipes and strategies for success with situations which are often so messy that even experienced project managers and tech support gurus wail in despair. Garbage like this, for example:


This screenshot is taken from the import of The PPTX from Hell, which a frustrated PM asked for help with just as I began reviewing the draft of Marek's book about a month ago. It contained nearly 32,000 superfluous spacing tags and was such a mess that it choked all the best professional macros usually deployed to deal with such things. Last year, I had developed my own way of dealing with these things that involved RTF bilingual exports and some search and replace magic in Microsoft Word, but when I shared it with Marek, he said "There's a better way", and indeed there is. On page 23 of this book. It was much cleaner and faster, and in a few minutes I was able to produce a clean slide set that was much easier to read and translate in the CAT tool. A page that costs 50 cents (of the €18 purchase price of the guide) earned me a 140x return and saved hours of working frustration for the translation team.

The book covers a lot more than just the esoterica of really messed up source files. It is a superb introduction to dealing with tags and markup for students at university and for those new to the translation profession and its endemic technologies, and it has sober, engaging guidance at every level for experienced professionals. I consider it an essential troubleshooting work for those in support roles of internal translation departments and, quite honestly, for my esteemed colleagues in First Level Support at memoQ. Marek is a superb trainer and an articulate teacher, with a humility that masks expertise which very often surprises, delights and informs those of us who are sometimes thought to be experts.

I am also particularly pleased that in the final version of his text he addresses the seldom discussed matter of how to factor markup into cost quotations and service charges for translations. memoQ is particularly well designed to address these problems, because weighting factors equivalent to word or character counts can be incorporated in file statistics, offering a simple, transparent and fair way of dealing with the frustrations that too often leave project managers screaming and crying in frustration shortly before... or after planned deliveries.

Whatever aspect of tags may interest you in translation technology and most particularly in memoQ, this book will give you the concise, clear answers you need to understand the best actions to take.

The PDF e-book is available for purchase here: https://payhip.com/b/tHUDx


Forget the CAT, gimme a BAT!

It's been nine months since my last blog post. Rumors and celebrations of my demise are premature; I have simply felt a profound reluctance to wade in the increasingly troubled waters of public media and the trendy nonsense that too often passes for professional wisdom these days. And in pandemic times, when most everything goes online, I feel a better place for me is in a stall to be mucked or sitting on a stump somewhere watching rabbits and talking to goats, dogs or ducks. Certainly they have a better appreciation of the importance of technology than most advocates of "artificial intelligence".


But for those more engaged with such matters, a recent blog post by my friend and memoQ founder Balázs Kis, The Human Factor in the Development of Translation Software, is worth reading. In his typically thoughtful way, he explores some of the contradictions and abuses of technology in language services and postulates that

... for the foreseeable future, there will be translation software that is built around human users of extraordinary knowledge. The task of such software is to make their work as efficient and enjoyable as possible. The way we say it, they should not simply trudge through, but thrive in their work, partially thanks to the technology they are using. 

From the perspective of a software development organization, there are three ways to make this happen:  

  • Invent new functionality 
  • Interview power users and develop new functionality from them 
  • Go analytical and work from usage data and automate what can be automated; introduce shortcuts 

I think there is a critical element missing from that bullet list. Some time ago, I heard about a tribe in Africa where the men typically carry one tool with them into the field: a large knife. Whatever problem they might encounter is to be solved with two things: their human brains and, optionally, that knife. In a sense, we can look at good software tools in a similar way, as that optional knife. Beyond the basic range of organizing functions that one can expect from most modern translation environment tools, the solution to a challenge is more often to be found in the way we use our human brains to consider the matter, not so much the actual tool we use. So, from a user perspective and from the perspective of a software development organization, thriving work more often depends not so much on features but on a flexible approach to problem solving based on an understanding of the characteristics of the material challenge and the possibilities, often not adequately discussed, of the available tools. But developing capacities to think frequently seems much harder than "teaching" what to think, which is probably why the former approach is seldom found in professional language service training, even when the trainers may earnestly believe this is what they are facilitating.

I'll offer a simple example from recent experience. In the past year, most of my efforts have been devoted to consulting and training for language technology applications, trying to deal with crappy CMS systems for which developers never gave proper consideration to translation workflows or developing methods to handle really weird outliers like comment translation for distributed PDFs or filtering the "protected" content of Microsoft Word documents with restricted editing to... uh... protect the "restricted" parts.

That editing function in Microsoft Word was new to me despite the fact that I have explored and used many functions of that tool since I was first introduced to it in 1986. I qualify as a power user because I am probably familiar with at least five percent of the program's features, though I am constantly learning new ways to apply that five percent. And the 95% remaining is full of surprises:

Most of the text here can't be edited in MS Word, but default CAT tool filters cannot exclude it.

Only the highlighted text can be edited in the word processor, and that was also the only text to be translated. The real files were much larger than this example, of course, and the text to be translated was interspersed with a lot of text to be left alone. What can you do?

It was interesting to see the various "solutions" offered, some of which involved begging or instructing the customer to do one thing or another, which is not always a practical option. And imagine the hassles of any kind of manual selection, copying and replacement if you have hundreds of pages like this. So some kind of automation is needed, really. Oh, and you can't even hide the protected text. It will import with the default filters of the translation tool, where it will then be indistinguishable from the actual text to be translated and it can be modified. In other words, bye-bye "protection".

What can be done?

There are a number of possibilities that fall short of developing a new option for import filters, which could take years given the often sluggish development cycles for any major CAT tool. One would be...

... to consider that a Microsoft Word DOCX file is really a ZIP archive with a bunch of stuff inside it. That stuff includes a file called document.xml, which contains the actual text of the MS Word document:


That XML file has an interesting structure. All the document text is in one line as one can see when it is opened in a code editor like Notepad++:


I've highlighted the interesting part, the part with the only text I want to see after importing the file for translation (i.e. the text for which editing is not restricted in MS Word). Ah yes, my strategy here is to deal with the XML text container for the DOCX file and ignore the rest. When the question was raised, I knew there must be such a file, but despite exploring the internal bits of MS Office files with ZIP archive tools for about a decade now, I never actually had occasion to poke around inside of document.xml, and I knew nothing of that file's structure. But simple logic told me there must be a marker there somewhere which would offer a solution.

As it turned out, the relevant markers are a set of tags denoting the beginning and end of a text block with editing permission. These can be seen at the start and finish of the text I highlighted in the screenshot. So all that remains is to filter that mess. A simple thing, really.

In memoQ, there is a "filter" which is not really a filter: the Regex Text Filter. It's actually a toolkit for building filters for text-based files, and XML files are really just text files with a lot of funky markup. I don't care about any of that markup except in the blocks I want to import, so I customized the filter settings accordingly:


A smattering of regular expressions went a long way here, and the expressions used are just some of many possible ways to parse the relevant blocks. Then I added the default XML filter after the custom regex text filter, because memoQ makes filter sequencing of many kinds very easy that way. This problem can be solved with any major CAT tool I think, but I don't have to think very hard about such things when I work with memoQ. The result can be sent from memoQ as an XLIFF file to any other tool if the actual translator has other preferences. Oh, the joys of interoperable excellence....

The imported text for translation, with preview 

After translation, document.xml is replaced in the DOCX file by the new version, and the work is done, the "impossible" accomplished without any new features added to the basic toolkit. Computer assistance is all very well, but without brain-assisted translation you're more likely to achieve half the result with double the effort or more.