Translation Tribulations: bilingual files

Showing posts with label bilingual files. Show all posts

Aug 26, 2019

Exporting compatible XLIFF (XLF) bilingual files from memoQ

Here we go again. Although memoQ is the undisputed leader for compatibility and interoperability among translation environment tools, users still encounter problems exchanging files, particularly XLIFF of some sort, with users of other tools. This is not because of any actual difficulty producing compatible XLIFF files, but rather a matter of deficient tool training and the failure to date by memoQ product designers to make the ease of interoperability a little more obvious. Some other tools, like recent versions of SDL Trados Studio, come pre-configured on installation to recognize the proprietary file extensions for memoQ's flavor of XLIFF ("MQXLIFF") and renamed ZIP packages (MQXLZ) containing XLIFF files, but others (or versions of SDL Trados Studio from many years ago) need to be configured to recognize those extensions, or someone simply has to change the MQXLIFF file extension to an extension that will be recognized by any tool: *.xliff or *.xlf are the choices.

The two-step solution is shown here:

On the Documents ribbon in memoQ, click on the tiny arrow under the Export icon and choose the option to export a bilingual file. There is some blue text which, if clicked, will allow a compatible XLIFF file to be exported, albeit with the MQXLIFF extension that some other programs might not recognize.

When the Export button in the dialog (marked 1, above) is clicked, the Save As dialog (marked 2, above) appears, simply change the file extension (the part after the period) to "xlf", for example. Then any program that reads XLIFF files can work with the file you export from memoQ. Despite the change of extension, memoQ will still recognize the file it produced, so it is possible to re-import it, for example if another person has made corrections to the XLIFF file that you want to use to update your translation or reference resources.

In some much older versions of memoQ, it does not work to change the extension in the export dialog; this has to be done directly to the exported file in whatever folder you save it in.

Of course, all of this will be rather difficult if you are one of those users who has not fixed the awful Microsoft Windows default to hide the extensions of known file types. Fixing that particular stupidity requires slightly different measures in different versions of Windows, but in Windows 10 you can do that on the View ribbon of Windows Explorer by marking the choice to show file name extensions:

Jun 5, 2017

Technology for Legal Translation

Last April I was a guest at the Buenos Aires University Facultad de Derecho, where I had an opportunity to meet students and staff from the law school's integrated degree program for certified public translators and to speak about my use of various technologies to assist my work in legal translation. This post is based loosely on that presentation and a subsequent workshop at the Universidade de Évora.

Useful ideas seldom develop in isolation, and to the extent that I can claim good practice in the use of assistive technologies for my translation work in legal and other domains it is largely the product of my interactions with many colleagues over the past seventeen years of commercial translation activity. These fine people have served as mentors, giving me my first exposure to the concepts of platform interoperability for translation tools, and as inspirations by sharing the many challenges they face in their work and clearly articulating the desired outcomes they hoped to achieve as professionals. They have also generously and frequently shared with me the solutions that they have found and have often unselfishly shared their ideas on how and why we should do better in our daily practice. And I am grateful that I can continue to learn with them, work better, and help others to do so as well.

A variety of tools for information management and transformation can benefit the work of a legal translator in areas which include but are not limited to:

corpus utilization,
text conversion,
terminology management,
diverse information retrieval,
assisted drafting,
dictated speech to text,
quality assurance,
version control and comparison, and
source and target text review.

Though not exhaustive, the list above can provide a fairly comprehensive basis for education of future colleagues and continued professional development for those already active as legal translators. But with any of the technologies discussed below, it is important to remember that the driving force is not the hardware and software we use in technical devices but rather the human mind and its understanding of subject matter and the needs of the particular task or work process in the legal domain. No matter how great our experience, there is always something more and useful to be learned, and often the best way to do this is to discuss the challenges of technology and workflow with others and keep an open mind for new approaches with promise.

Reference texts of many kinds are important in legal translation work (and in other types of translation too, of course). These may be monolingual or multilingual texts, and they provide a wealth of information on subject matter, terminology and typical usage in particular contexts. These collections of text – or corpora – are most useful when the information found in them can be read in context rather than isolation. Translation memories – used by many in our work – are also corpora of a kind, but they are seriously flawed in their usual implementations, because only short segments of text are displayed in a bilingual format, and the meaning and context of these retrieved snippets are too often obscure.

An excerpt from a parallel corpus showing a treaty text in English, Portuguese and Spanish

The best corpus tools for translation work allow concordance searches in multiple selected corpora and provide access to the full context of the information found. Currently, the best example of integrated document context with information searches in a translation environment tool is found in the LiveDocs module of Kilgray's memoQ.

A memoQ concordance search with a link to an "aligned" translation

A past translation and its preview stored in a memoQ LiveDocs corpus, accessed via concordance search

A memoQ LiveDocs corpus has all the advantages of the familiar "translation memory" but can include other information, such as previews of the translated work as well. It is always clear in which document the information "hit" was found, and corpora can also include any number of monolingual documents in source and target languages, something which is not possible with a traditional translation memory.

In many cases, however, much context can be restored to a traditional translation memory by transforming it into a "document" in a LiveDocs corpus. This is because in most cases, substantial portions of the translation memory will have its individual segment records stored in document order; if the content is exported as a TMX file or tab-delimited text file and then imported as a bilingual document in a LiveDocs corpus, the result will be almost as if the original translations had been aligned and saved, and from a concordance hit one can open the bilingual content directly and read the parts before and after the text found in the concordance search.

Legal translation can involve text conversion in a broad sense in many ways. Legal translators must often deal with hardcopy or faxed material or scanned files created from these. Often documents to translate and reference documents are provided in portable document format (PDF), in which finding and editing information can be difficult. Using special software, these texts can be converted into documents which can be edited, and portions can be copied, pasted and overwritten easily, or they can be imported in translation assistance platforms such as SDL Trados Studio, Wordfast or memoQ. (Some of these environments include integrated facilities for converting PDF texts, but the results are seldom as suitable for work as PDF or scanned files converted with optical character recognition software such as ABBYY FineReader or OmniPage.)

Software tools like ABBYY FineReader can also convert "dead" scanned text images into searchable documents. This will even work with bad contrast or color images in the background, making it easier, for example, to look for information in mountains of scanned documents used in legal discovery. Text-on-image files like the example shown above completely preserve the layout and image context of the text to be read in the best way. I first discovered and used this option while writing a report for a client in which I had to reference sections of a very long, scanned policy document from the European Parliament. It was driving me crazy to page through the scanned document to find information I wanted to cite but where I had failed to make notes during my first reading. Converting that scanned policy to a searchable PDF made it easy to find what I needed in seconds and accurately cite its page number, etc. Where there is text on pictures, difficult contrast and other features this is often far better for reference purposes than converting to an MS Word document, for example, where the layouts are likely to become garbled.

Software tools for translation can also make text in many other original formats accessible to translators in an ergonomically simpler form, also ensuring, where necessary, that no text is overlooked because of a complicated layout or because it is in an easily overlooked footnote or margin note. Text import filters in translation environments make it easy to read and translate the words in a uniform working environment, with many reference tools and other help available, and then render the translated text back into its original format or some more useful bilingual format.

An excerpt of translated patent claims exported as a bilingual table for review

Technology also offers many possibilities for identifying, recording and controlling relevant terminology in legal translation work.

Large quantities of text can be analyzed quickly to find the most frequent special vocabulary likely to be relevant to the translation work and save these in project glossaries, often enabling that work to be organized better with much of the clarification of terms taking place prior to translation. This is particularly valuable in large projects where it may be advisable to ensure that a team of translators all use the same terms in the target language to avoid possible confusion and misunderstanding.

Glossaries created in translation assistance tools can provide terminology hints during work and even save keystrokes when linked to predictive, "intelligent" writing features.

Integrated quality checking features in translation environments enable possible deviations of terminology or other issues to be identified and corrected quickly.

Technical features in working software for translation allow not only desirable terms to be identified and elaborated; they also enable undesired terms to be recorded and avoided. Barred terms can be marked as such while translating or automatically identified in a quality check.

A patent glossary exported from memoQ and then made into a PDF dictionary via SDL Trados MultiTerm

Technical tools enable terminology to be shared in many different ways. Glossaries in appropriate formats can be moved easily between different environments to share them with others on a team which uses diverse technologies; they can also be output as spreadsheets, web pages or even formatted dictionaries (as shown in the example above). This can help to ensure consistency over time in the terms used by translators and attorneys involved in a particular case.

There are also many different ways that terminology can be shared dynamically in a team. Various terminology servers available usually suffer from being restricted to particular platforms, but freely available tools like Google Sheets coupled with web look-up interfaces and linked spreadsheets customized for importing into particular environments can be set up quickly and easily, with access restricted to a selected team.

The links in the screenshot above show a simple example using some data from SAP. There is a master spreadsheet where the data is maintained and several "slavesheets" designed for simple importing into particular translation environment tools. Forms can also be used for simplified data entry and maintenance.

If Google Sheets do not meet the confidentiality requirements of a particular situation, similar solutions can be designed using intranets, extranets, VPNs, etc.

Technical tools for translators can help to locate information in a great variety of environments and media in ways that usually integrate smoothly with their workflow. Some available tools enable glossaries and bilingual corpora to be accessed in any application, including word processors, presentation software and web pages.

Corpus information in translation memories, memoQ LiveDocs or external sources can be looked up automatically or in concordance searches based on whole or partial content matches or specified search terms, and then useful parts can be inserted into the target text to assist translation. In some cases, differences between a current source text and archived information is highlighted to assist in identifying and incorporating changes.

Structured information such as dates, currency expressions, legal citations and bibliographical references can also be prepared for simple keystroke insertion in the translated text or automated quality checking. This can save many frustrating hours of typing and copy revision. In this regard, memoQ currently offers the best options for translation with its "auto-translation" rulesets, but many tools offer rules-based QA facilities for checking structured information.

Voice recognition technologies offer ergonomically superior options for transcription in many languages and can often enable heavy translation workloads with short deadlines to be handled with greater ease, maintaining or even improving text quality. Experienced translators with good subject matter knowledge and voice recognition software skills can typically produce more finished text in a day than the best post-editing operations for machine pseudo-translation, with the exception that the text produced by human voice transcription is actually usable in most situations, while the "gloss" added to machine "translations" is at best lipstick on a pig.

Reviewing a text for errors is hard work, and a pressing deadline to file a brief doesn't make the job easier. Technical tools for translation enable tens of thousands of words of text to be scanned for particular errors in seconds or minutes, ensuring that dates and references are correct and consistent, that correct terminology has been used, et cetera.

The best tools even offer sophisticated tools for tracking changes, differences in source and target text versions, even historical revisions to a translation at the sentence level. And tools like SDL Trados Studio or memoQ enable a translation and its reference corpora to be updated quickly and easily by importing a modified (monolingual) target text.

When time is short and new versions of a source text may follow in quick succession, technology offers possibilities to identify differences quickly, automatically process the parts which remain unchanged and keep everything on track and on schedule.

For all its myriad features, good translation technology cannot replace human knowledge of language and subject matter. Those claiming the contrary are either ignorant or often have a Trumpian disregard for the truth and common sense and are all too eager to relieve their victims of the burdens of excess cash without giving the expected value in exchange.

Technologies which do not assist translation experts to work more efficiently or with less stress in the wide range of challenges found in legal translation work are largely useless. This really does include machine pseudo-translation (MpT). The best “parts” of that swindle are essentially the corpus matching for translation memory archives and corpora found in CAT tools like memoQ or SDL Trados Studio, and what is added is often incorrect and dangerously liable to lead to errors and misinterpretations. There are also documented, damaging effects on one’s use of language when exposed to machine pseudo-translation for extended periods.

Legal translation professionals today can benefit in many ways from technology to work better and faster, but the basis for this remains what it was ten, twenty, forty or a hundred years ago: language skill and an understanding of the law and legal procedure. And a good, sound, well-rested mind.

*******

Further references

Speech recognition

Dragon NaturallySpeaking: https://www.nuance.com/dragon.html
Tiago Neto on applications: https://tiagoneto.com/tag/speech-recognition
Translation Tribulations – free mobile for many languages: http://www.translationtribulations.com/2015/04/free-good-quality-speech-recognition.html
Circuit Magazine - The Speech Recognition Revolution: http://www.circuitmagazine.org/chroniques-128/des-techniques
The Chronicle - Speech Recognition to Go: http://www.atanet.org/chronicle-online/highlights/speech-recognition-to-go/
The Chronicle - Speech Recognition Is in Your Back Pocket (or Wherever You Keep Your Mobile Phone): http://www.atanet.org/chronicle-online/none/speech-recognition-is-in-your-back-pocket-or-wherever-you-keep-your-mobile-phone/

Document indexing, search tools and techniques

Archivarius 3000: http://www.likasoft.com/document-search/
Copernic Desktop Search: https://www.copernic.com/en/products/desktop-search/
AntConc concordance: http://www.laurenceanthony.net/software/antconc/
Multiple, separate concordances with memoQ: http://www.translationtribulations.com/2014/01/multiple-separate-concordances-with.html
memoQ TM Search Tool: http://www.translationtribulations.com/2014/01/the-memoq-tm-search-tool.html
memoQ web search for images: http://www.translationtribulations.com/2016/12/getting-picture-with-automated-web.html
Upgrading translation memories for document context: http://www.translationtribulations.com/2015/08/upgrading-translation-memories-for.html
Free shareable, searchable glossaries with Google Sheets: http://www.translationtribulations.com/2016/12/free-shareable-searchable-glossaries.html

Auto-translation rules for formatted text (dates, citations, etc.)

Translation Tribulations, various articles on specifications, dealing with abbreviations & more:
http://www.translationtribulations.com/search/label/autotranslatables
Marek Pawelec, regular expressions in memoQ: http://wasaty.pl/blog/2012/05/17/regular-expressions-in-memoq/

Authoring original texts in CAT tools

Translation Tribulations: http://www.translationtribulations.com/2015/02/cat-tools-re-imagined-approach-to.html

Autocorrection for typing in memoQ

Translation Tribulations: http://www.translationtribulations.com/2014/01/memoq-autocorrect-update-ms-word-export.html

Sep 16, 2015

Getting around language variant issues in memoQ LiveDocs

I was told by some other users that a fundamental change had been made in the way language data are accessed in LiveDocs. It was said that until a few versions ago it had been possible to use documents for reference in LiveDocs regardless of their sublanguage settings. So I was told. The truth is more complicated than that.

According to my tests, memoQ 2015 is the first version of memoQ to have a logically consistent treatment of language variants for both bilingual and monolingual documents in corpora. All the other versions tested (memoQ 2013R2, 2014, 2014R2) are equally screwed up and show the same results.

The "visibility" of a monolingual or bilingual document when viewed in a corpus attached to a project running under memoQ 2015 follows these rules:

the sublanguage (language variant) settings for source and target (of the document or the project) must match the project
or the language setting (of the document or the project) must be generic.

Two rules. Pretty simple. It doesn't matter what version of memoQ the project or corpus was created in, only which version is actively running.

I created a test corpus with the following document mix:

The corpus contained 11 documents, both bilingual and monolingual with a mix of generic language settings and settings with language variants specified (such as German for Germany, Switzerland and Liechtenstein and English for Zimbabwe, the US and UK).

In a project running under memoQ 2015 with the languages set to generic German and generic English, all 11 documents in the corpus were accessible.

So if you want access to all LiveDocs corpus data for the major languages of your project, it is necessary to use generic language settings, either when you load the data into LiveDocs (difficult unless you always use the resource console, since adding documents to a corpus from within a project automatically applies the project's language settings!) or in the languages specified for the project itself. And this will only work with memoQ 2015. If you want to apply penalties to particular language variants this can be done using keyword markers (as seen in the screenshot above) and configuring the More penalties tab of the LiveDocs settings file applied to that corpus.

If the same corpus is attached to a project running under memoQ 2015 with language settings for Swiss German and generic English, the documents available from the corpus are these:

For a Swiss German and UK English project under memoQ 2015, this is the picture:

And for a Germany's German and US English:

All the screenshots above can be predicted based on the two rules stated. Work it out.

"But what happens with earlier versions of memoQ?" you might wonder. It's messy. Here is a look at a Swiss German and UK English project under memoQ 2013 R2, 2014 and 2014 R2:

And here's a project with generic German and Generic English under memoQ 2013 R2, 2014 and 2014 R2:

In each case the five bilingual documents are visible no matter what the project's language settings are. However, there is strict adherence to language variants and the generic language setting for monolingual documents! In my opinion, that's for the birds. I see no good reason to follow a different rule for data availability in bilingual versus monolingual documents. So in a sense, Kilgray has cleaned up this inconsistency in the latest version of memoQ.

Some have expressed a desire for a "switch" setting to allow language variant settings to be ignored. And perhaps Kilgray will provide such a feature in the future. But the best way to get there now is simply to make your project's language settings generic.

Changing the language settings for bilingual data in an existing LiveDocs corpus

If you have a corpus with a mix of language settings and you want to convert these to generic settings or a particular variant, this can be done as follows currently only for bilingual documents:

Select the bilingual documents to export from the corpus and export them to a folder. (If you choose to zip them all together, unpack the *.zip file later to make a folder of the exported *.mqxlz files.
Re-import the *.mqxlz files to the LiveDocs corpus via the Resource Console so you are able to specify the exact language settings you want. In the import dialog, you'll have to change the filter setting manually from "binary" to "XLIFF". These *.mqxlz files are not the same as bilingual files from a translation document in a project and are not recognized automatically.

Unfortunately, there is no way to change the language settings of a monolingual document except to re-import it in the Resource Console in its original form and set the language variant (or generic value) there.

So really, for now, the best way to go seems to be to use memoQ 2015 with generic project language settings.

Sep 15, 2015

A quick trip to LiveDocs for EUR-Lex bilingual texts

Quite a number of friends and respected colleagues use EUR-Lex as a reference source for EU legislation. Being generally sensible people, some of them have backed away from the overfull slopbucket of bulk DGT data and built more selective corpora of the legislation which they actually need for their work.

However, the issue of how to get the data into a usable form with a minimum of effort has caused no little trouble at times. The various texts can be copied out or downloaded in the languages of interest and aligned, but depending on the quality of the alignment tool, the results are often unsatisfactory. I've been told that AlignFactory does a better job than most, but then the question of how best to deal with the HTML bitexts from AlignFactory remains.

memoQ LiveDocs is of course rather helpful for quick and sometimes dirty alignment, but if the synchronization of the texts is too many segments off, it is sometimes difficult to find the information one needs even when the (bilingual) document is opened from the context menu in a concordance window.

EUR-Lex offers bi- or tri-lingual views of most documents in a web page. The alignments are often imperfect, but the synchronization is usually off by only one or two segments, so finding the right text in a document's context is not terribly difficult. So these often imperfect alignments are usually quite adequate for use as references in a memoQ LiveDocs corpus. Here is a procedure one might follow to get the EUR-Lex data there.

The bilingual text of a view such as the one above can be selected by dragging the cursor to select the first part of the information, then scrolling to the bottom of the window and Shift+clicking to select all the text in both columns:

Copy this text, then paste it into Excel:

Then import the Excel file as a file for "translation" in a memoQ project with the right language settings. Because of quirks with data access in LiveDocs if the target language variants are specified and possibly not matched, I have created a "data conversion project" with generic language settings (DE + EN in my case as opposed to my usual DE-DE + EN-US project settings) to ensure that data stored in LiveDocs will be accessed without trouble from any project. (This irritating issue of language variants in LiveDocs was introduced a few version ago by Kilgray in an attempt to placate some large agencies, but it has caused enormous headaches for professional translators who work with multiple sublanguage settings. We hope that urgent attention will be given to this problem soon, and until then, keep your LiveDocs language data settings generic to ensure trouble-free data access!)

When the Excel file is added to the Translations file list, there are two important changes to make in the import options. First, the filter must be changed from Microsoft Excel to "multilingual delimited text" (which also handles multilingual Excel files!). Second, the filter configuration must be "changed" to specify which data is in the columns of interest.

The screenshot above shows the import settings that were appropriate for the data I copied from EUR-Lex. Your settings will likely differ, but in each case the values need to be checked or set in the fields near the arrows ("Source language" particularly at the top and the three dropdown menus by the second arrow below).

Once the data are imported, some adjustments can be made by splitting or joining segments, but I don't think the effort is generally worth it, because in the cases I have seen, data are not far out of sync if they are mismatched, and the synchronization is usually corrected after a short interval.

In the Translations list of the Project home, the bilingual text can be selected and added to a LiveDocs corpus using the menus or ribbons.

The screenshot below shows the worst location of badly synchronized data in the text I copied here:

This minor dislocation does not pose a significant barrier to finding the information I might need to read and understand when using this judgment as a reference. The document context is available from the context menu in the memoQ Concordance as well as the context menu of the entry appearing in the Translation results pane.

A similar data migration procedure can be implemented for most bilingual tables in HTML files, word processing files or other data sources by copying the data into Excel and using the multilingual delimited text filter.

Jan 31, 2014

Re-importing reviewed translations in memoQ server projects.

One of the unexpected benefits of testing the memoQ cloud server is that it gives me a good opportunity to reproduce and test some of the disaster scenarios encountered when working with project managers not fully aware of the implications of their choices when setting up server projects. Many of the problems that come to my attention relate to revision workflows that many experienced translators like to use.

For various reasons, exporting bilingual formats - XLIFF, Wordfast Classic-compatible DOC or RTF tables - is a popular review method. Sometimes these are checked by others who do not use memoQ, sometimes they are convenient for QA with third-party tools or have other perceived advantages. As far as I know, translators can always do bilingual exports from a local installation of memoQ connected to a server project. (I haven't looked for ways to block this, because I find the notion of doing so extremely counterproductive.)

The trouble comes when they want to re-import the corrected and/or commented bilingual file to update the translation. This is possible only by the project manager working in the management window. There's no way for the translator to import a bilingual reviewed document. I asked Kilgray Support about this and was told that this is intentional because of the difficulties which could result in the project. So basically if you edit a bilingual, someone with project manager privileges for that project has to re-import it for you.

Well, not always. Sometimes it works, just a bit differently than one might imagine.

If the project manager sets up the project to use "desktop documents" (as opposed to "server documents"), then it is possible to export bilingual files and re-import them. This cannot be done directly with documents in the Translation list. But it will work with Views of these documents. Or with the full bilingual exports of the documents themselves!

The screenshot above is from a server project with configure for desktop documents. For these two project types (with or without web translation enabled), when working from a memoQ desktop client you are able to import any bilingual to update a translation file from this interface.

But wait, that's not all!

or is it? That command says "Import" and so devious minds might wonder if it is possible to import something other than a bilingual export from one of your translation documents. Indeed, the dialog that appears for file selection tantalizingly offers all supported formats. So I grabbed a DOCX with a financial text and gave it a try:

SWEET SUCCESS! A mere translator, I've cracked the memoQ server and uploaded another document to my project. Visions of Caribbean beach vacations in the warm sun dance through my head as I contemplate all the extra work I can upload to certain client projects and bill because it is, well, right their on that server project they assigned to me.... then I get this message:

General error.
TYPE:
System.NullReferenceException

MESSAGE:
Object reference not set to an instance of an object.

SOURCE:
MemoQ.Project

CALL STACK:
   at MemoQ.Project.ProjectDocument.TranslationDocumentProjectContext.UpdateDocumentDivisionInfos(TranslationDocumentCore doc)
   at MemoQ.Translation.Storage.SqlCeStorageService.SaveDocument(TranslationDocumentCore document, SavePreferences savePref)
   at MemoQ.Translation.Storage.SqlCeStorageService.SaveDocumentAndAllInfos(TranslationDocumentCore document, ICompactSerializable workflowInfo, ICompactSerializable tagDefinitions, ICompactSerializable lqaModel, SavePreferences savePref)
   at MemoQ.Project.TranslationDocImportExport.LocalImportController.doImportOrReimport(ImportTask importTask, String targetLangCode, String docStorageDir, Boolean reimport)
   at MemoQ.Project.TranslationDocImportExport.LocalImportController.DoJob()

That's Geek for "Nice try, buddy... an automated report has just been sent to the NSA and our agents will be at your door shortly." I click again, and that message self-destructs and I am given a second warning:

A knock on the door, then after a stern interview, I sink back into my desk chair and click Continue. Next time I'll stick to importing bilingual exports of documents or views from the project. That works beautifully from the View tab and allows me to work as I prefer, since by now most of my clients with memoQ servers know to use the desktop documents options for my projects. Perhaps in the future, Kilgray's programmers might tighten up the code to trap errors from fools like me who do the unexpected.

But of course, as of memoQ 2013 R2, when translating with the desktop client in server projects, one can usually update a translation with minor edits using the reviewed monolingual target document and the Import reviewed document command in the Translations menu of the project. This won't let you bring in comments, and it does have some (but increasingly fewer) quirks, but in many cases it works quite nicely. A video demonstration of this feature can be seen here.

Nov 16, 2012

Trados 2007 goes to the guillotine - or not?

A recent Twitter exchange reinforced the impression of confusion I had regarding SDL's intentions with the older Trados technology. Many translators, corporate users and language service brokers continue to use the 1990s technology of Trados 2007 (which is the current translation technology of many EU institutions until it is finally phased out starting in the coming year), and recent troubles with the loss of "bilingual DOC" exports in memoQ 6.0.64 brought the matter of the old technology to a very uncomfortable head. (Earlier builds of memoQ must be used, or one must be patient until after the version 6.2 release, when this feature will be re-developed.)

The exchange with the colleague on Twitter as well as the frequent contradictions in ongoing discussions among my friends and clients in the translation world made it clear that definitive answers were needed to abate unnecessary fears and allow people to plan the future of their processes with proper information. So I talked to Paul Filkin, Client Communities Director at SDL,whose Multifarious blog is my favorite resource for reliable information about the technical arcana of Trados.

[KSL]: Judging from a recent Twitter traffic, there seems to be some confusion regarding SDL’s plans to discontinue support for SDL Trados 2007. So tell me – are Trados Workbench and TagEditor going away for good at last?

[PF] It’s worth clarifying that we are talking about SDL Trados 2007 and not SDL Trados 2007 Suite. The difference is that the Suite contains the latest version of SDL Trados 2007 (as well as various other applications) which is 8.3.0.363. But to answer your question specifically… no, Trados Workbench and TagEditor are not going away for good just yet. I imagine there will still be users working with the older version of these tools for some time yet, but over time they will of course become obsolete – we just need to allow for that time. The driving forces for anyone hanging onto these old versions will be development of hardware and new operating systems as well as upgrades to authoring systems that the retired versions will no longer be able to support.

[KSL]: What exactly ARE the difference between those two versions?

[PF] The best place to look for all the technical differences is the SDL knowledgebase where you can find a nice article called “What is new in SDL Trados 2007 Suite”:

http://kb.sdl.com/kb/article?ArticleId=2332&source=article&c=12&cid=23

[KSL]: If I am using SDL Trados Studio 2011 and my client expects T2007-style “uncleaned” files, what can I do?

[PF] The safest approach, because of differences between the old Trados versions is to ask your client to provide you with a fully segmented bilingual file, whether they are after TTX or Bilingual Doc. SDL Trados Studio 2011 supports TTX and Bilingual Doc as a file type without the need for SDL Trados 2007 at all. Your client should be able to provide these files for you because they have the appropriate software already.

The other alternative, if you don't have a copy of SDL Trados 2007 Suite which you can still purchase with SDL Trados Studio 2011 today, is to use a free application from the SDL OpenExchange called the SDLXLIFF to Legacy Converter. This application can convert your Studio bilingual file to a Bilingual Doc or a TTX. This process caters for two parts in this workflow. First your client can edit these files in SDL Trados 2007 Suite and clean them into their Translation Memory, and second you can use the application to import the changes back into your SDLXLIFF so that you have the updated and approved version in your own Translation Memory. You can get this application here:

http://www.translationzone.com/en/openexchange/AppDetails.aspx?appid=194

[KSL]: What are the “dangers” in this approach? Where might it go wrong for my client?

[PF] You still have to provide your client with the “cleaned” file from Studio however because the Bilingual Doc or TTX created will not “clean up” into the fully formatted document you started with. This is because the Bilingual Doc or TTX is created from the SDLXLIFF and not from the original source file.

This also means that the SDLXLIFF has been segmented using the new file types in Studio and not with the old file types in Trados 2007 Suite or earlier so even though your client will be able to clean the file into their Translation Memory they may lose some ability to fully pretranslate the same source file using Trados 2007 Suite or earlier. This is actually the same problem that could occur when converting the file using memoQ or WordFast for example but as those clients only provide the translator with the source file and not a pretranslated bilingual file in the first place this doesn't seem to be an issue for them.

So all in all both approaches seem to work… the important thing is to understand what your client wants to do with the file when they get it.

[KSL]: Can these formats be edited and “cleaned” by the client to create a properly formatted target (translated) file?

[PF] Only if they were prepared using SDL Trados 2007 in the first place. There is no substitute for SDL Trados 2007 if the client wants a properly formatted target file and future leverage from their Translation Memory.

[KSL]: At what point can we expect support for TWB and TagEditor formats to be discontinued?

[PF] I think it’s likely that when we release the next version of the software SDL Trados 2007 Suite and SDL Trados Studio 2009 will be retired. However, the important thing to note is that we have the Trados 2007 infrastructure built into Studio and this allows users to upgrade Translation Memories, handle legacy bilingual files and more importantly use the SDL OpenExchange to develop applications that will support workflows using the older tools. We are already seeing developers looking at ways of improving their older solutions with Studio since we were awarded the EU contract last month.

[KSL]: Does SDL Trados Studio 2011 still include a version of TWB and TagEditor?

[PF] It’s not included automatically but you can still purchase it when you buy SDL Trados 2011. It’s not sold as a separate piece of software anymore.

[KSL]: That's good to know. Will this continue to be the case with the next release (Studio 2013???)?

[PF] The honest answer is we haven’t made a decision on this yet. SDL Trados 2007 Suite is really only needed by people who have create, rather than use, these legacy files. So in reality these people probably already have it… all they have to do is make sure they always prepare files for those who are translating them. This may be better for them and for the translator.

Sep 4, 2012

memoQ 6 desktop: working with other memoQ users

The best methods for memoQ desktop editions to work with other memoQ users are influenced by the versions of the software you and others use. If these versions are compatible, project information can be shared fully, including previews and status settings for the translation. Otherwise, many of the compromises of working with other translation environments apply.

Bilingual exchange files

Bilingual exchange files are generated via Project home > Translations > Export bilingual. If the other person also uses memoQ 6, the best and "friendliest" option is to create a memoQ XLIFF and include the skeleton and preview. The "skeleton" allows target files to be created. For earlier versions of memoQ, a simple XLIFF with the extension changed to XLF or one of the other bilingual formats will do. In memoQ 6, bilinguals are imported using the Import command (and are recognized automatically); in earlier versions, the Import/update bilingual command is used. The tags will always be respected.

If the bilingual DOC format is used for exchange, the finished work must be exported via Export bilingual. Other export commands produce monolingual target documents. The least complicated format to use for users of memo 4.2 to 5.0 is the two-column RTF.

Version 6 TMs and termbases are fully compatible with Version 5 of memoQ, and data can be exchanged with all versions via TMX and delimited formats.

Project backups

A fully configured project with all settings, translation files, TMs, termbases and corpora can be sent by creating a backup. On the memoQ Dashboard, select the project and click Backup selected. Warning: backup files can be very large, so you might want to detach very big TMs first, for example. And of course this requires the same version of memoQ to be used.

Handoff packages (PM version only)

If translations have been assigned by name in Project home > Translations, handoff packages for translators and reviewers, including necessary resources, can be created after running a check on Project home > Overview > General > Handoff checks.This, too, requires the same version of memoQ.

Another point to consider: memoQ versions 5 and 6 can co-exist on the same desktop computer, so if you need to continue working with clients who have the version 5 server, for example, there is little to stand in the way of upgrading to version 6. The only real difficulty might arise if you want to attach corpora you have migrated; this may require restoring the LiveDocs corpora from the backup of the old version or creating a new one.

Search me!