Translation Tribulations: OmegaT

Showing posts with label OmegaT. Show all posts

Jun 25, 2018

OmegaT: free CAT tool, free webinar

Click this graphic for more information and registration....

Didier Briel, current project manager of the Open Source OmegaT CAT tool, will discuss what makes this language service community resource unique, how it can enable you to work together comfortably in teams with others who use different tools (interoperability) and other interesting matters.

Have a look and see if this is the versatile, multi-platform tool you've been looking for!

Jun 6, 2017

Build your own online reference TM for a team or anyone!

In the past, I have published several articles describing the use of free Google Sheets as a means of providing searchable glossaries on the Internet. This concept has continued to evolve, with current efforts focused on the use of forms and Google's spreadsheet service API to provide even more free, useful functionality.

On a number of occasions I have also mentioned that the same approaches can be used for translation memories to be shared with people having different translation environments, including those working with no CAT tools at all. However, the path to get there with a TM might not be obvious to everyone, and the effort of finding good tools to handle the necessary data conversions can be frustrating.

I've put up a demonstration TM in Portuguese and English here: https://goo.gl/LXXgmf

Here is a selection from the same data collection, selecting for matches of the Portuguese word 'cachorro': https://goo.gl/9KJils
This uses the same parameterized URL search technique described in my article on searchable glossaries.

A translation memory in a Google Sheet has a few advantages:

It can be made accessible to anyone or to a selected group (using Google's permission scheme)
It can be downloaded in many formats for adding to a TM or other reference source on a local computer
Hits can also be read in context if the TM content is in the order it occurs in the translated documents. This is an advantage currently offered in commercial translation environment tools only by memoQ LiveDocs corpora.

Web search tools of many kinds can be configured easily to find data in these online Google Sheet "translation memories" - SDL Trados Studio, OmegaT and memoQ are among those tools with such facilities integrated, and IntelliWebSearch can bridge the gap for any environment that lacks such a thing.

But... how do you go from a translation memory in a CAT tool to the same content in a Google Sheet? This can be confusing, because many tools do not offer an option to export a TM to a spreadsheet or delimited text file. Some suggestions are found in an old PrAdZ thread, but I found a more satisfactory way of dealing with the problem.

A few years ago, the Heartsome Translation Studio went free and Open Source. It contains some excellent conversion tools. I downloaded a copy of the Heartsome TMX Editor (the available installers for Windows, Mac and Linux are here) and used it to convert my TMX file.

The result was then uploaded to a public directory on my personal Google Drive, and the URL was noted for building queries. Fairly straightforward.

The Heartsome TMX Editor seems like it might be a useful tool to replace Olifant as my TMX editor. While the TM editor in my tool of choice (memoQ) has improved in recent years, it still does not do many things I require, and some of this functionality is available in Heartsome.

May 27, 2017

CAT tools for weapons license study

More than a decade ago I found a very useful book on practical corpus linguistics, which has had perhaps the greatest impact of any single thing on the way I approach terminology. Among other things, it discusses how to create special text collections for particular subjects and then mine these for frequently used expressions in those domains. It has become a standard recommendation in my talks at professional conferences and universities as well as in private consultations for terminology.

Slide from my recent talk at the Buenos Aires University Facultad de Derecho

In the last two weeks I had an opportunity to test my recommendations in a little different way than the one in which I usually apply them. Typically I use subject-specific corpora in English (my native language) to study the "authentic" voice of the expert in a domain that may be related to my own technical specialties but which differs in its use of language in significant ways. This time I used it and other techniques to study subject matter I master reasonably well (the features, use and safety aspects of firearms for hunting) with the aim of acquiring vocabulary and an idea of what to expect for a weapons qualification test in Portugal, where I have lived for several years but have not yet achieved satisfactory competence in the language for my daily routine.

It all started two weeks ago when I attended an all-day course on Portugal's firearm and other weapon laws in Portalegre. Seven and a half solid hours of lecture left me utterly fatigued at the end of the day, but it was an interesting one in which I had a lot of aha! moments as I saw a lot of concepts presented in Portuguese which I knew well in German and English. Most of the time I looked up words I saw in the slides or in the course textbook prepared by the PSP and made pencil notes on vocabulary in my book.

Twelve days afterward I was scheduled to take a written text, and in the unlikely event that I passed it, I was supposed to be subject to a practical examination on the safe use of hunting firearms are related matters.

Years ago when I studied for a hunting license in Germany I had hundreds of hours of theoretical and practical instruction in a nine-month course concurrent with a one-year understudy with an experienced hunter. Participants in a German hunting course typically read dozens of supplemental books and study thousands of sample questions for the exam.

The pickings are a little slimmer in Portugal.

There are no study guides in Portuguese or any other language which help to prepare for the weapons tests that I am aware of except the slim book prepared by the police.

There are, however, a number of online forums where people talk about their experiences in the required courses and on the tests. Sometimes there are sample questions reproduced with varying degrees of accuracy, and there is a lot of talk about things which people found particularly challenging.

So I copied and pasted these discussions into text files and loaded them into a memoQ project for Portuguese to English translation. The corpus was not particularly large (about 4000 words altogether), so the number of candidates found in a statistical survey was limited, but still useful to someone with my limited vocabulary. I then proceeded to translate about half of the corpus into English, manually selecting less frequent but quite important terms and making notes on perplexing bits of grammar or tricks hidden in the question examples.

A glossary in progress as I study for my Portuguese weapons license

The glossary also contained some common vocabulary that one might legitimately argue does not belong in a specialist glossary, but since these were common words likely to occur in the exam and I did not know them, it was entirely appropriate to include them.

Other resources on the subject are scarce; I did find a World War II vintage military dictionary for Portuguese and English which can easily be made into a searchable PDF using ABBYY Finereader or other tools but not much else.

Any CAT tool would have worked equally well for my learning objectives - the free tools AntConc and OmegaT are in no way inferior to what memoQ offered me.

On the day of the test, I was allowed to bring a Portuguese-to-English dictionary and a printout of my personal glossary. However, the translation work that I did in the course of building the glossary had imprinted the relevant vocabulary rather well on my mind, so I hardly consulted either. I was tired (having hardly slept the night before) and nervous (so that I mixed up the renewal intervals for driver's licenses and hunting licenses), and I just didn't have the stamina to pick apart some particularly long, obtuse sentences), but in the end I passed with a score of 90% correct. That wouldn't win me any kudos with a translation customer, but it allowed me to go on to the next phase.

Practical shooting test at the police firing range

In the day of lectures, I dared to ask only one question, and I garbled it so badly that the instructor really didn't understand, so I was not looking forward to the oral part of the exam. But much to my surprise, I understood all the instructions on exam day, and I was even able to joke with the policeman conducting the shooting test. In the oral examination in which I had to identify various weapons and ammunition types and explain their use and legal status, and in the final part where I went on a "hunt" with a police commissioner to demonstrate that I could handle a shotgun correctly under field conditions and respond appropriately to a police check, I had no difficulties at all except remembering the Portuguese word for "trigger lock". All the terms I had drilled for passive identification in the written exam had unexpectedly become active vocabulary, and I was able to hold my own in all the spoken interactions - not a usual experience in my daily routine.

The use of the same professional tools and techniques that I rely on for my daily work proved far better than expected as learning aids for my examination and in a much greater scope than I expected. I am confident that a similar application could be helpful in other areas where I am not very competent in my understanding and active use of Portuguese.

If it works for me, it is reasonable to assume that others who must cope with challenges of a test or interactions of some kind in a foreign language might also benefit from learning with a translator's working tools.

Feb 21, 2015

CAT tools re-imagined - an approach to authoring and editing

I am often asked about the monolingual editing workflows I have used for some 15 years now to improve texts which were written originally in English, not created by translation from another language. And I have discussed various corpus linguistics approaches, such as to learn the language of a new specialty or the NIFTY method often presented by colleague Juliette Scott.

However, on a recent blitz tour of northern Portugal to test the fuel performance of the diesel wheels which may take me to the BP15 and memoQfest conferences in Zagreb and Budapest respectively later this year, I stopped off in Vila Real to meet a couple of veterinarians, one of whom is also a translator. During a lunch chat with typically excellent Portuguese cuisine, the subject of corpus research as an aid for authoring a review paper came up. I began to explain my (not so unusual) methods of editing and existing document when I was asked how the tools of translation technology might be applied to authoring original content.

The other translator at the table said, "It's a shame that I cannot use my translation memories to look things up while I write", and I replied that of course he could do this, for example with the memoQ TM Search Tool or similar solutions from other providers. And then he said, "And what about my term bases and LiveDocs corpora?", and I said I would sleep on it and get back to him. In the days that followed, other friends (coincidentally also veterinarians) asked my advice about editing the English of the Ph.D. theses and other works they will author in English as non-native speakers of that language. One of them noted that it would be "nice" if she could refer to corrections made by various persons and compare them more easily. I said I would sleep on that one too.

A few days after that the pain in my hands and feet from repetitive strain injuries and arthritis was unbearable, aggravated by a rope burn accident while stopping an attack on sheep by my over-eager hunting dog and by driving over 1000 km in a day. I doubled down on the pain meds, made a big jug of toxically potent sangria and otherwise ensured that I was comfortably numb and could enjoy a night of solid sleep.

It was not meant to be. Two hours later I woke up, stone sober, with a song in my head and the solution to the problem of my Portuguese friends writing in English and Tiago wanting to author his work in memoQ for the convenience of using its filters to review content. Since then the concept has continued to evolve and improve as others suggest ways of accommodating their writing or language learning needs.

After about a week of testing I scheduled one of my "huddle" presentation classes, an intimate TeamViewer training session to discuss the approach and elicit new ideas for adapting it better to the needs of monolingual authors. The recording of that session is available for download by clicking on the image of the title slide at the top of this post. (The free TeamViewer software is needed to watch the TVS file downloaded; double-click it, and the 67-minute lecture and Q&A will play.)

I'm currently building Moodle courses which provide more details and templates for this approach to authoring and editing, and it will be incorporated in parts of the many talks and workshops planned this year.

I am aware that SDL killed their authoring product, the Author Assistant, and that Acrolinx offers interesting tools in this area, as do others. But I'm usually hesitant to recommend commercial tools in an academic environment, because their often rapid pace of development (such as we see with memoQ) can play serious havoc with teaching plans and threaten the stability of an instructional program, which is usually best focused on concepts and not on fast-changing details. So I actually started out my work and testing of this idea using the Open Source tool OmegaT, the features of which are more limited but also more stable in most cases than the commercial solutions from SDL, Kilgray and others. But as I worked, I noticed that my greater familiarity with memoQ's features made it an advantageous platform for developing an approach, which in principle works with almost every translation environment tool.

Part of my motivation in creating this presentation was to encourage improvements in the transcription features available in some translation environments. But the more I work with this idea, the more possibilities I see for extending the reach of translation technology into source text authoring and making all the resources needed for help available in better ways. I hope that you may see some possibilities for your own work or learning needs and can contribute these to the discussion.

Jul 13, 2014

Mice Like Us

Among my great passions are myths and children's stories. The transformative, symbolic qualities of the good ones carry forward ideas, moral and ethical concepts in ways few classrooms can, and even bad ones may communicate at a level many a gifted orator cannot.

In many ways, the translators I know are like mice. They see themselves as small compared to the great Bridges Lying Across their peripatetic professional paths, easy prey for the More Ravenous, to be consumed perhaps by HAMPsTr hordes or Transformed Perfectly into thepigturds polluting the waters of roadside ditches.

The Merchants of the Machine - and you know who they are - have a story line consumed gladly by those who, placing presumed balance sheet profits ahead of real producers and lacking a long-term commitment to service and the interests of those from whom they extract toil and cash, position themselves as transformers of communication and translation, surfing the Big Wave of Big Data to a Bigger Future. Humans are fallible, alas, but the miraculous Machine in its comprehensible simplicity shall save us from the messy human mystery and lead us to a calculable future, a Thousand Years of Grace and Prosperity for the Chosen in control of the channels of distribution and marketing magic. But real life isn't like that.

We need a different narrative. As Dr. Bronowski said, "We have to close the distance between the push-button order and the human act." In alternative narratives, the mouse is not always the easy prey to the CAT, nor to any other creature. It's a matter of attitude, and sometimes organization.

From the simple act of kindness in Aesop's tale to the complex world of Redwall, we mice can read examples of how those seen as small and insignificant can in fact be the key to survival and triumph. Lacking the bluster and flames of the Great Beasts of our "industry" we must instead rely on those most essential tools, our brains, to come out ahead in the asymmetric competition.

Technology is, in fact, on our side as translators when it is used in conjunction with BAT*. There are Open Source tools available for organizing the work of individual translators or teams which, in clever hands, can compete at most every level with the finest of commercial technology. OmegaT, Rainbow and GlobalSight are just a few of a long list of these. And for the less "clever" (or those who prefer a bigger slice of normal life) there are simple software service offerings like Kilgray's memoQ cloud, which puts my freelance team on equal footing with any agency or corporate department using the latest and greatest technologies for their language processes. All this for a fraction of what my monthly phone bill used to be in the days before flatrates and Skype.

So what will it be? Will you be willing meat for some weasel's pot?

Caught in a trap of your own denial, uninformed belief and fear, listening to naught but Common Nonsense?

Or, like the mice of Redwall, will you gather your strength and skills, apply them in concert with like-minded professionals in your own interest and the interest of the public you serve and partake of the great feast on a table set for all who will come?

* Brain-assisted translation

Jun 4, 2014

OmegaT’s Growing Place in the Language Services Industry

Guest post by John Moran

As both a translator and a software developer, I have much respect for the sophistication of the well-known proprietary standalone CAT tools like memoQ, Trados, DejaVu and Wordfast. I started with Trados 2.0 and have seen it evolve over the years. To greater and lesser extents these software publishers do a reasonable job at remaining interoperable and innovating on behalf of their main customers - us translators. Kudos in particular to Kilgray for using interoperability standards to topple the once mighty Trados from its monopolistic throne and forcing SDL to improve their famously shoddy customer support. Rotten tomatoes to Across for being a non-interoperable island and having a CAT tool that is unpopular with most (but curiously not all) of the freelance translators I work with in Transpiral.

But this piece is about OmegaT. Unlike some of the other participants in the OmegaT project, I became involved with OmegaT for purely selfish reasons. I am currently in the hopefully final stage of a Ph.D. in computer science with an Irish research institute called the Centre for Next Generation Localisation (www.cngl.ie). I wanted to gather activity data from translators working in a CAT tool for my research in a manner similar to a translation process research tool called TransLog. My first thought was to do this in Trados as that was the tool I knew best as a translator but Trados’ Application Programming Interface did not let me communicate with the editor.

Thus, I was forced to look for an open-source CAT tool. After looking at a few alternatives like the excellent Virtaal editor and a really buggy Japanese one called Benten I decided on OmegaT.

Aside from the fact that it was programmed in Java, a language I have worked with for about ten years as a freelancer programmer, it had most of the features I was used to working with in Trados. I felt it must be reliable if translators are downloading it 4000 times every month. That was in 2010. Four years later that number is about to reach 10,000. Even if most of those downloads are updates, it should be a worrying trend for the proprietary CAT tools. Considering SDL report having 135,000 paid Trados licenses in total - that is a significant number.

Having downloaded the code, I added a logging feature to it called instrumentation (the “i” in iOmegaT) and programmed a small replayer prototype. Imagine pressing a record button in Trados and later replaying the mechanical act of crafting the translation as a video, character-by-character or segment-by-segment, and you will get the picture. So far we use the XML it generates mainly to measure the impact of machine translation on translation speed relative to not having MT. Funnily enough, when I developed it I assumed it would show me that MT was bunk. I was wrong. It can aid productivity, and my bias was caused by the fact that I had never worked with useful trained MT. My dreams of standing ovations at translator association meetings turned to dust.

If I can’t beat MT I might as well join it. About a year and a half ago, using a government research commercialization feasibility grant, I was joined by my friend Christian Saam on the iOmegaT project. We studied computational linguistics in Ireland and Germany on opposite sides of an Erasmus exchange programme, so we share a deep interest in language technology and a common vocabulary. We set about turning the software I developed in collaboration with Welocalize into a commercial data analysis application for large companies that use MT to reduce their translation costs.

However, MT post-editing is just one use case. We hope to be able to use the same technique to measure the impact of predictive typing and Automatic Speech Recognition on translators. I believe these technologies are more interesting to most translators as they impose less on word order.

At this point I should point out that CNGL is a really big research project with over 150 paid researchers in areas like speech and language technology. Localization is big business in Ireland. My idea is to funnel less commercially sensitive translator user activity data securely, legally, transparently and, in most cases anonymously from translators using instrumented CAT tools into a research environment to develop and, most importantly, test algorithms to help improve translation productivity. Someone once called it telemetry for offline CAT tools. My hope is that though translation companies take NDAs very seriously, it is also a fact that many modern content types like User Generated Content and technical support responses appear on websites almost as soon as they are written in the source language, so a controlled but automated data flow may be feasible. In the future it may also be possible to test algorithms for technologies like predictive typing without uploading any linguistic data from a working translator’s PC. Our bet is that researchers are data-tropic. If we build it they will come.

We have good cause to be optimistic. Welocalize, our industrial partner, is an enlightened kind of large translation company. They have a tendency to want to break down the walls of walled gardens. Many companies don’t trust anything that is free, but they know the dynamics of open-source. They had developed a complex but powerful open-source translation memory system called GlobalSight, and its timing was precipitous.

It was released around the same time SDL announced they were mothballing their newly acquired Idiom WorldServer systemtheir system to replace it with the newly acquired Idiom WorldServer (now SDL WorldServer). This panicked a number of corporate translation buyers, who suddenly realized how deeply networked their translation department was via its web services and how strategically important the SDL TMS system was. As the song goes, "you don’t know what you’ve got till its gone" – or, in this case, nearly gone.

SDL ultimately reversed the decision to mothball TMS WorldServer and began to reinvest in its development, but that came too late for many some corporates who migrated en-masse to GlobalSight. It is now one of the most implemented translation management systems in the world in technology companies and Fortune 500’s. A lot of people think open-source is for hippies, but for large companies open-source can be an easy sell. They can afford engineering support, department managers won’t be caught with their pants down if the company doing the development ceases to exist, and most importantly their reliance on SDL’s famously expensive professional services division is reduced to zero. If they need a new web-service, they can program it themselves. GlobalSight is now used in many companies who are both customers of Welocalize and companies like Intel who are not. Across should pay heed. At a C-Suite level corporates don’t like risk.

However, GlobalSight had a weakness. Unlike Idiom WorldServer it didn’t have its own free CAT tool. Translators had a choice of download formats and could use Trados but Trados licenses are expensive and many translators are slow to upgrade. Smart big companies like to have as much technical control of their supply-chain as possible so Welocalize were on the lookout for a good open-source CAT tool. OpenTM2 was a runner for a while but it proved unsuitable. In 2012 they began an integration effort to make OmegaT compatible with GlobalSight. When I worked with Welocalize as an intern I saw wireframes for an XLIFF editor on the wall but work had not yet started. Armed with data from our productivity tests and Didier Briel, the OmegaT project manager, who was in Dublin to give a talk on OmegaT, I made the case for integrating OmegaT with GlobalSight. It was a lucky guess. Two years later it works smoothly and both applications benefit from each other.

What did I have to gain from this? Data.

So why this blog? Next week I plan to present our instrumentation work at the LocWorld tradeshow and I want Kilgray to pay heed. OmegaT is a threat to their memoQ Translator Pro sales and that threat is not going to reduce with time. Christian and I have implemented a sexy prototype of a two-column working grid, and we can do the same trick importing SDL packages with OmegaT as they do with memoQ. Other large LSPs are beginning to take note of OmegaT and GlobalSight.

However, I am a fan of memoQ, and even though the poison pill has been watered down to homeopathic levels, I also like Kilgray’s style. The translator community has nothing to gain if a developer of a good CAT tool suffers poor sales. This reduces manpower for new and innovative features. Segment-level A/B testing using time data is a neat trick. The recent editing time feature is a step in the right direction, but it could be so much better. The problem is that CAT tools waste inordinate amounts of translator time, and the recent trend towards CAT tools connected to servers makes that even worse. Slow servers that are based on request-response protocols instead of synchronization protocols, slow fuzzy matches, bad MT, bad predictive typing suggestions, hours wasted fixing automatic QA to catch a few double spaces. These are the problems I want to see fixed using instrumentation and independent reporting.

So here is my point in the second person singular. Kilgray – I know you read this blog. Listen! Implement instrumentation and support it as a standard. You can use the web platform Language Terminal to report on the data or do it in memoQ directly. On our side, we plan to implement an offline application and web-application that lets translators analyse that data by manually importing it so they can see exactly how much they earn per hour for each client in any CAT tools that implement that standard. €10 says Trados will be last. A wise man once said you get the behavior you incentivize, and the per-word pricing model incentivizes agencies to not give a damn about how much a translator earns per hour. The important thing is to keep the choice about sharing translation speed data with the translator but let them share it with clients if they want to. Web-based CAT tools don’t give them that choice, so play to your strengths. Instrumentation is a powerful form of telemetry and software QA.

So to summarize: OmegaT’s place in the language services industry is to keep proprietary CAT tool publishers on their toes!

*******

See also the CNGL interview with Mr. Moran....

Feb 1, 2014

The fix is in for PDF charts

Over four years ago, I reviewed Iceni Infix after I began working with it. I'm not as strong a fan as some, because I generally have little enthusiasm for direct editing of PDFs and dealing with frequent problems such as missing unusual fonts and having to play the guess-my-optimum-font-substitution game, but I do find it useful in many situations. I found another one of those today.

A new client of a friend works with a horrible German program to produce reports full of charts. The main body of the text is written in Microsoft Word and is available as a reasonable DOCX file, but the charts are a problem, as they are available only in the specific, oddball tool or PDF format. Nobody wants to deal with that software, really. It is supported by no translation tools vendor I am aware of, and like another example of incompatible German software, Across, it enjoys the obscurity it deserves.

After thinking about the approach needed in this case, I realized that if the graphics could be isolated conveniently on pages, the XML export from the PDF document would contain only information from the graphics. After translation, the format could be touched up with Infix before making bitmap screenshots at an enlargement which would yield decent resolution when sized in layout. Of course, in projects involving multiple languages the XML files could be used with great convenience.

Selecting and deleting the text on the pages with Iceni Infix is really a no-brainer. The time charge for such work will be quite reasonable. And exporting the XML or marked-up text to translate is also quite straightforward:

The exports can be handled in nearly any CAT tool, so TMS and terminology resources can be put to full use. Or you can edit in a simple, free tool like Notepad++ or an XML-savvy editor.

The screenshot above shows the XML in memoQ. No customization of the default filter is required. Reports from other users who have worked in a similar way indicate that OmegaT and other environments generally have few, if any, problems. In one case there was trouble re-integrating the graphics in a project that also had 50 pages of text, but there may have been other issues I am not aware of in that case.

With the content in the TM, if the chart data are made available in another format, the translations can be transferred quickly to that for even better results. The same approach can be used for a very wide variety of other electronically generated graphic formats (except some of the really insane ones I've seen where the text is broken up; I don't know if Iceni sanitizes such messes or not).

I think this is an approach which can benefit many of us in a variety of projects. It is not really suited for cases of bitmap graphics, but I have other approaches there in which Iceni Infix may also play a useful role and allow CAT integration. Licenses for the tool are quite reasonably priced, and the trial version (in Pro mode) is entirely suited for testing and learning this process.

Jan 1, 2014

The 2013 translation environment tools survey

From mid-October until the end of 2013, I placed two small survey questions at the top of the blog page and publicized these in a variety of user forums. The questions were similar to two posed in 2010, because I was interested to see how things might have changed. This is, of course, an informal survey with a number of points in its "methodology" wide open to criticism, though its results are certainly more reliable than anything one can expect from the Common Sense Advisory :-) My personal interest here was to get an idea of the background readers here might have with various translation environment tools, because it is useful to know this when preparing posts on various subjects. Here is a quick graphic comparison of the 2010 and 2013 results:

Responses to the question about the number of translation environment tools were very similar in both cases. About half use only one, with between 25 and 30% of respondents using a second tool and increasingly small numbers going beyond that. The question posed covered preparation, translation and checking in projects, so some respondents using multiple tools may be translating and maintaining terminologies and translation memories in only one tool. I am encouraged by this result, as it means that despite changes in the distribution of particular tools, users are exercising good ergonomic sense and predominantly sticking to one for their main work. Everyone benefits from this: translators generally work more efficiently without tool hopping, and more effort is focused on what clients need - a good translation.

In 2010, half the respondents cited the use of some version of "SDL Trados" (more details on this were provided in a later survey); the next highest responses at just under 20% were for Déjà Vu and memoQ. Three and a half years later, Atril's share of users appears to have declined considerably, and the use of memoQ appears to be about on par with SDL Trados Studio. OmegaT, an excellent free and Open Source translation support tool capable of working with translation formats from the leading tools, appears to be doing better than many of the commercial tools in the survey, which should not surprise anyone familiar with that software.

Across continues to be a loser in every way. Despite massive efforts in the low end of the market to promote this incompatible Teutonic travesty and the availability of the client software free of charge to its victims (translators), no real progress has been made in the Drang nach Marktanteil. One would expect that a good solution supported by a competent professional development team and a marketing budget, available free to translators, would easily beat the low-profile OmegaT. And I am sure that this is the case. The case simply doesn't apply to Across, which drives some of the most technically competent translators I know completely berserk. The fact that OmegaT is about twice as popular despite its volunteer development and total lack of marketing budget speaks volumes.

More important than any of the individual figures for translation support tools are some of the implications for interoperable workflows that the numbers reveal. Most of the tools listed support XLIFF, so if you use a tool capable of exporting and reimporting translation content as XLIFF, developing an interoperable workflow for translation and review that will work with the majority of tools will probably not be that difficult. An XLIFF file from SDL Trados Studio or memoQ is usually a no-brainer for translation in Déjá Vu, OmegaT, Cafetran or Fluency, for example, and any concerns can be checked quickly with a "roundtrip test" using pseudotranslation or simply copying the source text to the target, for example.

While individual tools have largely improved in their mutual compatibility and ability to share translation and resource data, there is legitimate continuing concern about the increased use of translation servers by translation agencies and corporations with volume needs who manage their own translation processes. Jost Zetsche and I have expressed concerns in the past regarding the lack of compatibility between server platforms and various clients, though with the appropriate use of exchange formats, this can still be overcome.

The greatest challenges I have seen with server-based work is that the people creating and "managing" projects on these servers often lack a basic understanding of the processes involved, so that the skills of the translators competent with a particular client tool may be effectively nullified by an incompetently prepared job. I experienced this myself recently where segmentation, termbase rights and even the source language were set wrong on the server, and the project manager had no idea how to correct the situation. However, things worked out in the end, because I had a playbook of strategies to apply for such a case. In the end, better training and a good understanding of the interfaces to the processes our partners use can get us past most problems.

Nov 22, 2013

memoQuickie: keyboard shortcuts for migrants (updated)

(PM - Pro - 2013R2 - 2013 - 6.2 - 6.0 - 5.0)

You can adapt memoQ keyboard shortcuts to your personal preferences or to be ergonomically compatible with other translation environments tools you use frequently for better productivity and reduced risk of errors.

Although keyboard shortcuts can be managed in the Resource Console, it is more useful to do so under Tools > Options… > Keyboard shortcuts, because that is the only place where a given set of keyboard shortcuts can be selected for use. Marking the checkbox for a list in the dialog shown above will make it the active one.

Look carefully at the keyboard shortcuts available in memoQ. Not all of these commands are found in menus (for example, the shortcut for quick search with selected text in a translation grid, Ctrl+Shift+F by default). To examine a set of keyboard shortcuts, select it and click Edit to show the list.

To change a keyboard shortcut, select the value in the Shortcut key column of the editing dialog and press the new key combination.

Oct 6, 2013

OmegaT workshop in Holten (NL) November 18th!

is free software, but time is money, and to use this excellent Open Source tool effectively and enjoy its many benefits, expert guidance can be enormously helpful. This software holds its own with commercial leaders such as SDL Trados Studio and memoQ in many respects and surpasses them in some cases (for example, in its ability to read embedded objects, including charts, in the Microsoft Office document formats 2007, 2010 and 2013).

On November 17 and 18, the Stridonium Holten Lectures will feature Marc Prior in a workshop for OmegaT, a professional computer-assisted translation environment originally developed by Keith Godfrey and currently maintained and extended by a team led by Didier Briel. It is available in nearly 30 languages and includes:

• fuzzy matching
• match propagation
• simultaneous processing of multiple-file projects
• simultaneous use of multiple translation memories
• user glossaries with recognition of inflected forms

Document file formats include:

• Microsoft Office: Word, Excel, PowerPoint (DOCX, XSLX, PPTX, etc.)
• other translation tool formats such as TMX, TTX, TXML, XLIFF & SDLXLIFF
• XHTML and HTML
• Open Document formats (LibreOffice, OpenOffice.org)
• MediaWiki (Wikipedia)
• plain text

... and around 30 other file types as well as

• Unicode (UTF-8) support for non-Latin alphabets
• support for right-to-left languages (Hebrew, Arabic, etc.)
• an integral spelling checker
• MT integration with Google Translate

Marc has been a technical translator since 1988, working primarily from German to English. In 2002, he joined Keith Godfrey, the original author of OmegaT, to launch the program as an open-source (free) project. Since then, he has been involved in the project in various capacities, including

• project co-ordination
• Match propagation
• authoring of manuals
• localization co-ordination
• website management and
• programming of auxiliary tools

He is a frequent source of advice and support on OmegaT user forums, contributed to the knowledge of the user community in many other online venues (including this blog) and has spoken on OmegaT at events in Germany and Belgium. He also introduced a module for computer-assisted translation tools, based on OmegaT, to the Professional Support Group (PSG) of the UK's Institute for Translation and Interpreting (ITI). He currently lives in Gelsenkirchen in the German federal state North Rhine-Westphalia.

The day will start with a conceptual overview for OmegaT, followed by a session demonstrating a sample project and responding to any questions from participants.

After lunch, the first afternoon session will present some extensions and advanced functions of OmegaT.

In the final session of the day, Marc will discuss “drawbacks that aren’t”, answer questions and debunk myths (appropriately entitled “Myths, FAQs and Workarounds”).

The workshop fee is €250 (€225 for Stridonium members), which includes a room at the venue for Sunday night arrivals who can enjoy an optional networking dinner and get a fresh start in the teaching sessions the next day. The availability of rooms included in the workshop fee is limited, so book early!

The workshop is designed for translators and language project managers interested in the many possibilities for using this Open Source tool in their work.

Further information and updates can be found on the Stridonium events page, which also includes a button link for registration and payment ("Register for the Holten Lectures 3") below the course description.

You can also follow @stridonium on Twitter and watch the hash tags #strido and #Holten for announcements.

CPD points have been applied for with Bureau BTV in the Netherlands.

Previous Stridonium workshops in Holten have included corpus linguistics with the NIFTY method last June (participant's report here) and the recent teamwork day which presented ideas for overcoming distance in collaboration and using free and Open Source technologies as alternatives or addenda to more restrictive, proprietary commercial server solutions (participant's report here).

Future event plans include legal English for the insurance sector with a UK attorney and a series of three workshops on legal English for contracts (April 28, 2014), legal drafting (May 19, 2014) and commercial law (June 2, 2014) with attorney Stuart Bugg. These events would also interest practicing attorneys and others involved in the drafting and revision of contracts.

How to get there:
From Deventer(A1)

- Take the A1 towards Hengelo/Enschede

- Exit 26: Lochem/Holten

- Turn left for Raalte, follow the signs for Holterberg

- Go straight ahead over the roundabout, turn right after the viaduct and left at the T junction

- Turn left at the roundabout and after 50 m take a right turn for Holterberg

- After approx 1 km turn right (at yellow building)

From Enschede/Hengelo (A1)

- A1 towards Deventer/Apeldoorn/Amsterdam

- Exit 27: Holten/Markelo

- Continue through the center of Holten, take the Holterberg exit at the roundabout and after 50 m take a right turn for Holterberg

- After approx 1 km turn right (at yellow building)

By train

- A 10-minute walk from the station (Beukenlaantje)

- Let the organizers know when you arrive and either they or hotel staff will collect you!

Sep 2, 2013

The Holten Lectures: upcoming CPD events

After its debut with a day on corpus compilation and analysis for legal terminology last June, the Stridonium series of Holten Lectures will continue with two events planned for this autumn at the venue in the east of the Netherlands.

September 30th will present concepts and strategies for "The Third Way" - methods and technologies to support teams of translators collaborating and exchanging information effectively from any location. The day will include presentations and practice with simple tools for small "conferences" for project planning, content consultation and instruction, practical guidelines and hands-on practice for dynamic terminology exchange and maintenance, making translation memories available even to those who do not use CAT tools and a basic overview of "interoperable" formats to determine the most effective strategies for data sharing. Workshop instruction will be a blend of on-site and remote teaching to better emphasize the day's lessons. The fee for this workshop, including the hotel room Sunday night (but excluding the cost of dinner) is €250 (€225 for Stridonium members). The availability of rooms included in the workshop fee is limited, so book early. Online registration will open next week but arrangement can be made before then with the e-mail contact below.

On November 17th, there will be a one day workshop on the Open Source translation environment tool OmegaT, taught by Marc Prior, who has been involved in advocacy, education and coordination with that tool for many years. I've been frankly amazed by the range of function offered by this free software and the fact that it often provides features and file compatibility not available in leading commercial tools. And, as with many other tools, there are options for integrating various kinds of server resources. For individual or even corporate users this is a tool worth taking seriously and perhaps including in your workflows.

Each Monday event in Holten offers a Sunday night arrival and networking dinner to prepare for a refreshed start early the next day with the sessions running from 9 am to 5:30 pm. The days I spent at this venue last summer impressed me with the quality of its intimate meeting facilities, outstanding staff service and cuisine and beautiful natural environment around the hotel on the outskirts of the village.

An application for continuing professional education points has been made for these events with the Bureau BTV for those in the Netherlands who require them to maintain their certified status.

More information on these and other events will be forthcoming; for further details on schedules, content and availability, you can also contact info (at) stridonium (dot) com for updates.

Jun 29, 2013

Caption editing for YouTube videos

I've spent a great deal of time in recent weeks examining different means for remote instruction via the Internet. In the past I've had good success with TeamViewer to work on copywriting projects with a partner or deliver training to colleagues and clients at a distance. So far I have avoided doing webinars because of the drawbacks I see for that medium, both as an instructor and as a participant, but I haven't completely excluded the possibility of doing them eventually. I've also looked at course tools such as Citrix Go To Training and a variety of other e-learning platforms, such as Moodle, which is the tool used by universities and schools around the world and which also seems to be the choice of Kilgray, ProZ and others for certain types of instruction.

Recorded video can be useful with many of these platforms, and since I've grown tired of doing the same demonstrations of software functions time and again, I've decided to record some of these for easy sharing and re-use. When I noticed recently that my Open Source screen recording software, CamStudio had been released in a new version, I decided quite spontaneously to make a quick video of pseudotranslation in memoQ to test whether a bug in the cursor display for the previous version of CamStudio had been fixed.

After I uploaded the pseudotranslation demo to YouTube, I noticed that rather appalling captions (subtitles) had been created by automatic voice recognition. Although voice recognition software such as Dragon Naturally Speaking is usually very kind to me, Google's voice recognition on YouTube gave miserable results.

I soon discovered, however, that the captions were easy to edit and could also be exported as text files with time cues. These text files can be edited very easily to correct recognition errors or combine segments to improve the timing and subtitle display.

Once the captions for the original language are cleaned up and the timing is improved, the text files can be translated and uploaded to the video in YouTube to create caption tracks in other languages. As a test, I did this (with a little help from my friends), adding tracks for German and European Portuguese to the pseudotranslation demo. And if anyone else cares to create another track for their native language from this file, I'll add it with credits at the start of the track.

It's easy enough to understand why I might want to add captions in other languages to a video I record in English or German. But why would I want to do so in the original language? My thick American accent is one reason. I like to imagine that my English is clear enough for everyone to understand, but that is a foolish conceit. Of course I speak clearly - I couldn't use Dragon successfully if that were not true. But someone with a knowledge of English mostly based on reading or interacting with people who have very different accents might have trouble. It happens.

Although most of the demonstration videos SDL has online for SDL Trados Studio are easy to follow, some of the thick UK accents are really frightening and difficult for some people in places like Flyover America to follow. Some Kilgray videos of excellent content are challenging for those unaccustomed to the accents, and the many wonderful demos of memoQ, WordFast, OmegaT and other tools by CAT Guru on YouTube would have been difficult for me before I was exposed to the linguistic challenges of the wide world that can English. All of these excellent resources in English would benefit from clear English subtitles.

How difficult is it to create captions? The three-minute pseudotranslation demo cost me about ten minutes of work to clean up the subtitles. The English captions for another slightly shorter video explaining the use of the FeeWizard Online to estimate equivalent rates for charging by source or target words, lines, pages, etc. also took me about 10 or 15 minutes with all the text and timing corrections. And I've spent a good bit of time in the past week transcribing a difficult spoken English lecture by a German professor: it took me about 7 hours of transcription work to cope with a spoken hour. I don't know if this is typical, because I almost never do this sort of thing, and there were a lot of WTF moments. But I suppose three to seven times the recording length might be a reasonable range for estimating the effort of a draft edit and some timing changes. Not bad, really.

So if you are involved in creating instructional videos to put on YouTube or use elsewhere, please consider this easy way of making good work even better by investing a little time in caption creation and editing. Once you have done this for the original language, it will also be a simple matter to translate those captions to make your content even more accessible.

Dec 7, 2012

Terminology collaboration with Google Docs: new twists

A few years ago, I put a notice in this blog about a colleague's interesting use of Google Docs to share terminology with faraway colleagues in a project. Earlier this year I enjoyed a similar collaboration with a Google Docs spreadsheet used to exchange and update terminology on a very time-critical annual report with translators using two different versions of Trados, memoQ and no CAT tools at all.

Sharing information via Google Docs was quite easy, and we were able to configure the access rights without a lot of trouble. But at the time I still had a bit of extra, annoying effort to get the data imported into my working environment for frequent updates.

Tonight another colleague contacted me with basically the same problem. Her client manages data in an Excel spreadsheet, which gets updated and sent out frequently. She already had the idea that this might work better in Google Docs, and I agreed.

But I kept thinking about that annoying update problem....

One can, of course, export Google Docs spreadsheet data in various formats:

I've marked a few of the export ("download") formats which are probably useful for a subsequent import into a translation environment too. But the downloaded data still won't be in the "perfect" format in many cases, and there will be extra steps involved in matching it up to the fields in your term base.

One way to simplify this problem is to create another online spreadsheet in Google Docs and link it to the original, shared spreadsheet. In this second spreadsheet, which is your "personal" copy for use in your favorite tool, you reformat the data so they will export in a form that makes your later import to your tool's termbase easier.

In my case, I use memoQ, so I created a Google Docs spreadsheet with the first row containing the default field names of interest from the CSV export of my memoQ termbase:

I linked the columns in my personal online spreadsheet with the shared spreadsheet using the ImportRange command. It has two arguments, both of which have to enclosed in quotes. The first one (argument #1 above) is the key for the online spreadsheet to be referenced; it is shown in the URL of the online spreadsheet (just look in the address bar of your browser and you will see it). The second one specifies the sheet and the range of cells to copy. I put this formula in one cell and it copied the entire column for me.

I could, if I wanted to, use conditional (IF) statements and other tricks to transform some data in columns of the other sheet and build the semicolon-delimited term properties list (Term_Info) that memoQ uses to keep track of gender, capitalization enforcement, forbidden status, etc. But none of that is needed for simple sharing of terms, definitions and examples for instance.

I simply export my personal Google Docs spreadshit as CSV, then import it into my desired termbase in memoQ. If I have IDs set for the term entries in the online spreadsheet, I could even choose ID-based updates of my local termbase when I do the import.

Those who use other tools, such as Trados, OmegaT or WordFast can set up their spreadsheets and do exports as best suits their needs.

This approach enables you to take source data in nearly any format in an online spreadsheet and rework it for the greatest convenience in the tool of your choice. Although not a "perfect" solution, it is perhaps a convenient one until better resources are commonly available for dynamic, cross-platform translation collaboration.

So what do I recommend my friend to try as a first step? Maybe take the client's latest spreadsheet, copy and paste it into Google Docs and share it with the client and others on the team. Then it's already "up there" for everyone's convenience (local XLSX copies can be downloaded any time), and she can get on with creating a convenient "view" of this shared data in her personal spreadsheet, which can be exported for local use any time. That personal sheet could also be shared (read only access recommended) with other team members using the same translation environment tool.

Search me!