Translation Tribulations: SDLXLIFF

Showing posts with label SDLXLIFF. Show all posts

Jan 25, 2015

SDL conquers translation at Universidade Nova in Lisbon

The day started inauspiciously for me, with a TomTom navigation system determined to keep me from the day planned at Lisbon's New University to discuss SDL Trados Studio and its place in the translation technology ecosphere. When the fourth GPS location almost proved a charm, and I hiked the last kilometer on an arthritic foot, swearing furiously that this was my last visit to the Big City, I found the lecture hall at last, an hour and a half late, and managed to arrive just after Paul Filkin's presentation of the SDL OpenExchange, an underused, but rather interesting and helpful resource center for plug-ins and other resources for SDL Trados Studio victims to bridge the gap between its out-of-the-box configurations and what particular users or workflows might require. There are a lot of good things to be found there - the memoQ XLIFF definition and Glossary Converter are my particular favorites. Paul talked about many interesting things, I was told, and there is even a plug-in created for SDL Trados Studio by a major governmental organization which has functionality much like memoQ's LiveDocs (discussed afterward but not shown in the talk I missed, however). In the course of the day, Paul also disclosed an exciting new feature for SDL Trados Studio which many memoQ users have been missing in the latest version, memoQ 2014 R2 (see the video at the end).

I arrived just in time for the highlight of the day, the demonstration of Portuguese speech recognition by David Hardisty and two of his masters students, Isabel Rocha and Joana Bernardo. Speech recognition is perhaps one of the most interesting, useful and exciting technologies applied to translation today, but its application is limited to the languages available, which are not so many with the popular Dragon Naturally Speaking application from Nuance. Portuguese is curiously absent from the current offerings despite its far more important role in the world than minor languages like German or French.

Professor Hardisty led off with an overview of the equipment and software used and recommended (slides available here); the solution for Portuguese uses the integrated voice recognition features of the Macintosh operating system. With Parallels Desktop 10 for Mac it can be used for Windows applications such as SDL Trados Studio and memoQ as well. Nuance provides the voice recognition technology to Apple, and Brazilian and European Portuguese are among the languages provided to Apple which are not part of Nuance's commercial products for consumers (Dragon Naturally Speaking and Dragon Dictate).

Information from the Apple web site states that

Dictation lets you talk where you would type — and it now works in over 40 languages. So you can reply to an email, search the web or write a report using just your voice. Navigate to any text field, activate Dictation, then say what you want to write. Dictation converts your words into text. OS X Yosemite also adds more than 50 editing and formatting commands to Dictation. So you can turn on Dictation and tell your Mac to bold a paragraph, delete a sentence or replace a word. You can also use Automator workflows to create your own Dictation commands.

Portuguese was among the languages added with OS X Yosemite.

Ms. Bernardo began her demonstration by showing her typing speed - somewhat less than optimal due to the effects of disability from cerebral palsy. I was told that this had led to some difficulties during a professional internship, where her typing speed was not sufficient to keep up with the expectations for translation output in the company. However, I saw for myself how the integrated speech recognition features enable her to lay down text in a word processor or translation environment tool as quickly as or faster than most of us can type. In Portuguese, a language I had thought not available for work by my colleagues translating into that language.

A week before I had visited Professor Hardisty's evening class, where after my lecture on interoperability for CAT tools, Ms. Rocha had shown me how she works with Portuguese speech recognition as I do, in "mixed mode" using a fluid work style of dictation, typing, and pointing technology. She said that her own work is not much faster than when she types, but that the physical and mental strain of the work is far less than when she types and the quality of her translation tends to be better, because she is more focused on the text. This greater concentration on words, meaning and good communication matches my own experience, but I don't necessarily believe her about the speed. I don't think she has actually measured her throughput. My observation after the evening class and again at the event with SDL was that she works as fast as I do with dictation, and when I have a need for speed that can go to triple my typing rate or more per hour.

In any case, I am very excited that speech recognition is now available to a wider circle of professionals, and with integrated dictation features in the upcoming Windows 10 (a free upgrade for Windows 8 users), I expect this situation will only improve. I cannot emphasize enough the importance of this technology for improving the ergonomics of our work. It's more than just leveling the field for gifted colleagues like Joana Bernardo, who can now bring to bear her linguistic skills and subject knowledge at a working speed on par with other professionals - or faster - but for someone like me who often works with pain and numbness in the hands from strain injuries, or all the rest of you banging away happily on keyboards, with an addiction to pain meds in your future perhaps, speech recognition offers a better future. Some are perhaps put off by the unhelpful, boastful emphasis of others on high output, which anyone familiar with speech recognition and human-assisted machine pseudo-translation (HAMPsTr) editing knows is faster and better than what any processes involving human revision of computer-generated linguistic sausage can produce, but it's really about working better and doing better work with better personal health. It's not about silly "Hendzel Units".

It has been pointed out a few times that Mac dictation or other speech recognition implementations lack the full range of command features found in an application like Dragon Naturally Speaking. That's really irrelevant. The most efficient speech recognition users I know do not use a lot of voice-controlled command for menu options, etc. I don't bother with that stuff at all but work instead very comfortably with a mix of voice, keyboard and mouse as I learned from a colleague who can knock off over 8,000 words of top-quality translation per short, restful day before taking the afternoon off to play with her cats or go shopping and spend some of that 6-figure translation income that she had even before learning to charge better rates.

Professor Hardisty also gave me a useful surprise in his talk - a well-articulated suggestion for a much more productive way to integrate machine translation in translation workflows:

David Hardisty's "pre-editing" approach for MpT output

The approach he suggested is actually one of the techniques I use with multiple TM matches in the working translation grid where I dictate - look at a match or TM suggestion displayed in a second pane and cherry-pick any useful phrases or sentence fragments and simply speak them along with selected term suggestions from glossaries, etc. and do it right the first time, faster than post-editing. This does work, much better than the sort of nonsense pushed too often into university curricula now by the greedy technotwits and Linguistic Sausage Purveyors, who in their desire for better margins and general disrespect of human service providers and employees fail to understand that good people, well-treated and empowered with the right tools, will beat the software and hardware software of "MT" and its hamsterized process extensions every time. Hardisty's approach is the most credible suggestion I have seen yet for possibly useful application of machine pseudo-translation in good work. Don't dump the MpT sewage directly into the target text stream like so many do as they inevitably and ignorantly diminish the level of achievable output quality.

After the lunch break, Paul Filkin gave an excellent Q&A clinic on Trados Studio features, showing solutions for challenges faced by users at all levels. It's always a pleasure to see him bring his encyclopedic knowledge of that difficult environment to bear in poised, useful ways to make it almost seem easy to work with the tools. I've sent many people to Paul and his team for help over the years, and none have been disappointed according to the feedback I have heard. The Trados Studio "clinic" at Universidade Nova reminded me why.

Finally, in the last hour of the day, I presented my perspective on how the SDL Trados Studio suite can integrate usefully in teamwork involving colleagues and customers with other technology and how over the years as a user of Déja Vu and later memoQ as my primary tool, the Trados suite has often made my work easier and significantly improved my earnings, for example with the excellent output management options for terminology in SDL Trados MultiTerm.

I spoke about the different levels of information exchange in interoperable translation workflows. I have done so often in the past from a memoQ perspective, but on this day I took the SDL Trados angle and showed very specifically, using screenshots from the latest build of SDL Trados Studio 2014, how this software can integrate beautifully and reliably as the hub or a spoke in the wheel of work collaboration.

The examples I presented using involved specifics of interoperability with memoQ or OmegaT, but they work with any good, professional tool. (Please note that Across is neither good nor a professional translation tool.) Those present also left with interoperability knowledge that no others in the field of translation have as far as I know - a simple way to access all the data in a memoQ Handoff package for translation in other environments like SDL Trados Studio, including how to move bilingual LiveDocs content easily into the other tool's translation memory.

Working in a single translation environment for actual translation is ergonomically critical to productivity and full focus on producing good content of the best linguistic character and subject presentation without the time- and quality-killing distractions of "CAT hopping", switching between environments such as SDL Trados Studio, memoQ, Wordfast, memSource, etc. Busy translators who learn the principles of interoperability and how to move the work in and out of their sole translation tool (using competitive tools for other tasks at which they may excel, such as preparing certain project types, extracting or outputting terminology, etc.) will very likely see a bigger increase in earnings than they can by price increases in the next decade. On those rare occasions where it might be desirable to use a different tool or to cope with the stress of change from one tool to another, harmonization of customizable features such as keyboard shortcuts can be very helpful.

I ended my talk with a demonstration of how translation files (SDLXLIFF) and project packages (SDLPPX) from SDL Trados Studio can be brought easily into memoQ for translation in that ergonomic environment, with all the TMs and terminology resources, returning exactly the content required in an SDLRPX file. Throughout the presentation there was some discussion of where SDL and its competitors can and should strive to go beyond the current and occasionally dubious levels of "compatibility" for even better collaboration between professionals and customers in the future.

One of the attendees, Steve Dyson, also published an interesting summary of the day on his blog.

Aug 3, 2013

memoQ&A: How do I leverage the pretranslated SDLXLIFF content?

Given the interesting and surprising answers I received in previous two-stage "quiz posts" in which a challenge was posed for others to answer before I present my approach, I have decided to try a series of such posts. I've polled a few friends about a possible name for this series - memoQuiz, memoQ&A, CATquiz or perhaps something else. The first two choices suggest that memoQ would be the focus, but despite the impressions some may have of my publication habits, memoQ is far from my only concern with productivity involving the software we use for translation processes. So I'll leave that question open for now and use the current "vote leader". Arguments for and against in the comments are welcome.

Today's "quiz" is inspired by my continuing research into the current status of interoperability between SDL Trados Studio and memoQ 2013. As Kilgray has continued to upgrade the quality of its filters and other features for working with files from other platforms, SDL advocates have been increasingly at pains to find the rare, exceptional cases that do not work well or at all and present these as "common" and proof that we should all just bow down and kiss the One Ring ;-) The latest variant of that theme which I saw involved tracked changes displayed in source segments of the translation grid. It was fascinating, really, but a bit bizarre and utterly outside anything in my experience with 13 years of commercial translation. I'm not about to torture myself with an unergonomic application if a simpler one covers most of my professional needs. The Pareto principle rules.

Here's the scenario:

You receive a pre-translated SDLXLIFF file with segments of various status. Some are pretranslated fuzzy matches, some are not-yet approved or even rejected pre-translated segments or one which the outsourcer confirmed (but did not "approve") before sending the file. And some segments have not been translated at all.
The outsourcer just went on holiday and forgot to send you the translation memory!
You want to be able to use the pretranslated and approved content in the SDLXLIFF file as a reference while you translate this file and others. How can this be done???
Here is the file to translate. It is an English source text being translated into German.

The file to translate as seen in SDL Trados Studio. Click to enlarge.

Thank you to those who contributed their suggestions in the comments! Here is how I approached the problem:

The bilingual file I was given has different "qualities" of translated segments. There are unconfirmed (and possibly dodgy) sentences, including a "rejected" 100% match, translated (confirmed) segments and approved (proofread) segments. A TM in memoQ gives me no opportunity to differentiate match quality based on row status. LiveDocs does!

So I send the SDLXLIFF file received to a LiveDocs corpus on a "temporary" basis, where I apply special settings to apply a fairly heavy penalty to unconfirmed segments, a mild penalty to translated (confirmed) but unapproved (not proofread) segments and no penalty at all to the parts which have already been check and approved.

Details of the settings configuration and an example of how these settings apply to the SDLXLIFF file used as an example are shown in the video below. A similar approach can be applied to any bilingual file (or translation stored in LiveDocs) where there may be significant differences in segment status.

Time index to the video tutorial:
0:30 Creating a new LiveDocs settings profile
1:05 Editing the new LiveDocs settings profile
1:29 Match threshold settings
2:16 Alignment penalties
3:01 Bilingual document penalties
3:45 Penalty for unfinished alignments
4:24 Sub-language difference penalty
4:57 A "tour" of the row status for segments in the SDLXLIFF
6:20 Adding the translation file to the LiveDocs corpus
7:14 Applying the new LiveDocs settings to the LiveDocs corpus used
8:00 How the new LiveDocs settings work for matches in the translation window
9:22 Advantages of using LiveDocs rather than a translation memory

Aug 2, 2013

Translating SDL Trados Studio SDLXLIFF files & more in memoQ!

My latest demonstration video actually covers a number of memoQ features so that I would have an excuse to create this video index:

Time Description
0:32 Importing the first SDLXLIFF file to memoQ
1:12 Exporting the finished translation
1:27 Viewing the translation in SDL Trados Studio 2009
1:40 Re-importing the edited translation for a TM update
3:24 Saving the translation in a LiveDocs corpus for later reference
3:55 Importing a new version of the text in an SDLXLIFF source file
4:25 Comparing source text versions
5:55 Document-based pretranslation ("X-Translate")
7:11 Examining a "warning" for forgotten tags
7:46 Results of the second translation in SDL Trados Studio

That is the sort of thing I was talking about in a recent blog post about new approaches for online instruction. Many times I have wished for just such an index for long webinars or even much shorter reference videos like this one.

This tutorial was inspired by a Skype chat with a colleague in the US a few days ago. She uses memoQ but works with a number of others who use various versions of SDL Trados Studio, and there were some questions about about how one might deal with TM updates after a translation as well as the inevitable new versions that legal and financial translators often encounter.

I have also noticed that quite a number of people are not up to date on SDLXLIFF compatibility with memoQ; this video also shows that former issues with preserving segment status have been taken care of, and everything now works well.

What is not obvious in the video is that one can also change the segmentation of the SDLXLIFF in memoQ; this happens only in the memoQ environment to allow better translation and more sensible translation memory content, and when the SDLXLIFF file is exported from memoQ, the original segmentation from Trados is preserved in the Trados environment.

Also not shown in the video is how I imported a third version of the source text, this time as a Microsoft Word file, not an SDLXLIFF. The document-based pre-translation (X-Translate) worked perfectly, and the target file was exported in the proper format (DOCX).

There are, of course, many other ways one could handle a "project" like this, but the procedure shown is not unlike what I sometimes do in projects myself.

********

I apologize for the quirky click animation in this tutorial; Camstudio had some problems I have never encountered before, and I'll have to get to the bottom of that if I keep using that tool. Otherwise, the video quality is probably the best I have achieved so far, and I would like to thank the friend who revealed the "secret" of better quality video for YouTube.

Jul 6, 2013

Exporting from an SDL Trados Studio package project in memoQ

In memoQ 6.2 and later versions, Kilgray has enabled the import of SDL Trados Studio project packages, SDLPPX files, for translation in memoQ. A wizard automatically creates a memoQ project from the SDL package, and TM information which is included is transferred to a memoQ TM. Other information, such as MultiTerm terminologies, analysis files and QA instructions are not included, however. So this solution is not suitable for every case.

However, it is more than adequate to handle the dozen or so SDLPPX files I've been given to translate myself so far. Most of my clients do not use very many of the features in SDL Trados Studio, so typically files to translate and a translation memory are all they try to send to me. If there is more, I can always use my licensed copy of the SDL software as an intermediate stage.

However, once a project package has been imported to memoQ and translated, there is sometimes trouble with the return package. Typically, some files in a multi-file project are "forgotton".

To include all files in the SDL Trados Studio return package, you must select all of them, and then click Export (stored path). This will create the SDLRPX file in the same folder from which you imported the SDLPPX file. Only selected files will be included in the package.

The other option for exporting project files is Export (dialog). This will export individual SDLXLIFF files for the translation documents in the memoQ project.

Here is a short video on YouTube showing this process:

Sep 4, 2012

memoQ 6.0.55: The Great Leap Forward with a Client API

Yesterday in the Yahoogroups forum, Kilgray's COO quietly announced the release of a new build of memoQ, which contains some very significant additions and improvements.

Important: memoQ 6.0.55 released
Mon Sep 3, 2012 12:12 pm (PDT). Posted by: "Istvan Lengyel"

Hi All,

Sorry for the long silence since the previous memoQ build - we had something in the making. memoQ 6.0.55 was uploaded to our website today, you can download and install it. This build now supports 64-bit installation, however, we have an issue with AutoUpdate which we may or may not be able to solve (it's third-party software), so for the time being you have to install 6.0.55 yourself from the website, and it may remain so in the future if we can't fix this. Therefore AutoUpdate is not available for an indefinite amount of time. Besides numerous bugfixes, there is new functionality added:

- the long-awaited SDLXLIFF filter,

- a client-side API - only for users of the project manager edition (hello Paul :)),

- on the server side, the possibility to use FirstAccept from content-connected projects.

We did not release this as a new version as the number of features does not qualify for a full new upgrade. I hope it will meet your expectations.

István

There were actually many other improvements; a great number of fixes to bugs in the concordance and LiveDocs, which were driving me nuts. Also, the performance for importing very large XLIFF files (think EU DGT scale!) was improved by an order of magnitude, though to see the full benefit one needs a 64-bit operating system and lots of RAM.

The SDLXLIFF filter should be helpful to the many memoQ users who translate files created in SDL Trados Studio. It has been possible to read this format since it first appeared using thestandard XLIFF filter, but this new filter offers better results.

I am particularly excited by possibilities offered by the new client application programming interface (API). This will enable certain functions of memoQ to be run from other applications, even when memoQ is not active. Available features include analysis and TM functions; I've seen three simple lines of macro code in Microsoft word that will export a memoQ TM to TMX, for example. I think this will lead to many interesting extensions of memoQ functionality and automation. Note that the API is only available in the Project Manager edition, but the additional cost of that version is less than I paid for my Déjà Vu X Workgroup upgrade years ago, and for those who work with multiple target languages or who need additional features for outsourcing and collaboration, an upgrade to memoQ Project Manager makes sense anyway.

And who knows? With rumors of SaaS resources for memoQ on the horizon, I can imagine more reasons to upgrade from memoQ Translator Pro.

Addendum: The documentation for the Client API is found at C:\ProgramData\MemoQ\SDK

Polish colleague Marek Pawelec also commented in the Yahoogroups list:

I'm happy to report that in version 6.0.55 you can import .sdlxliff files with mapping segment states without any hassle (see SDLXLIFF tab in filter settings) and if appropriate option is selected (States tab, Map memoQ states to XLIFF states on export, select SDLXLIFF in Source drop-down), memoQ states are properly mapped back to Studio states.

May 19, 2012

Dissecting SDL Trados Studio project files (SDLPPX) for translation with other tools

When a translation request with an SDLPPX (SDL Trados Studio project file) shows up in my inbox, it's always a bit irritating. The current version 5 of memoQ can't do a thing with these project files, unlike those from Star Transit, where a nicely automated wizard sets up a memoQ project with everything I need except terminology. To translate the content (SDLXLIFF files) of an SDL project file, you have to take the thing apart.

Of course, if you own an SDL Trados Studio license, it's usually a simple matter to open the package with Trados and export the resources you need. But today that didn't work. An error message informed me that the PPT source file for one of the SDLXLIFF resources was missing. Indeed. It was sitting on an FTP server to which the PM had failed to give me the access data before the weekend. Looked like I was SOOL.

In the past, when I took these SDLPPX file apart manually to get at the components I wanted, my luck was mixed. These are just ZIP files, so if you take a project file named MyWonderfulSubcontractedJob.sdlppx and rename it MyWonderfulSubcontractedJob.zip you can unpack it with WinZip or other utilities. Inside the ZIP file, the structure will look something like this:

Inside an SDL Trados Studio project package with Source language German (DE) and target language English (UK)

Both the source and target language folders contain an SDLXLIFF file with the source content. But there's a catch. You must take the SDLXLIFF file from the target folder.

Here's an example of a translation segment from the SDLXLIFF in the source fiile:

<trans-unit id="4e4fc380-8fac-4570-942b-a4bf6c4a4c7f"><source>Die neue Maschinenrichtlinie</source><seg-source>Die neue Maschinenrichtlinie</seg-source></trans-unit>

Notice anything missing? There is no tag set for target content. This is essentially a monolingual file. When imported into memoQ it will show zero segments! A look at the same translation unit in the SDLXLIFF file out of the target language folders shows the difference (a bit more than just the target tags highlighted):

<trans-unit id="4e4fc380-8fac-4570-942b-a4bf6c4a4c7f"><source>Die neue Maschinenrichtlinie</source><seg-source><mrk mtype="seg" mid="560">Die neue Maschinenrichtlinie</mrk></seg-source><target><mrk mtype="seg" mid="560" /></target><sdl:seg-defs><sdl:seg id="560" /></sdl:seg-defs></trans-unit>

This second SDLXLIFF file will import fine into other tools like memoQ using the XLIFF filter and allow you to translate without difficulty. I had not noticed this before, because in the past, if the SDLXLIFF file I imported had no segments, I just opened it in SDL Trados Studio, copied source to target and resaved it, and the resultant file imported without trouble and showed all segments. It took a missing original file that Studio demanded to save changes for me to look at matters a bit more closely.

I really do hope that a future version of memoQ will include a project import routine for these SDL projects similar to that for Star Transit projects. I am encountering SDLPPX files with increasing frequency due to the general lack of understanding interoperable workflows by those living in the Trados ghetto, and this added functionality in my primary tool would be a great help.

What should an SDL Trados Studio user do to ensure a less troublesome collaboration with those who use other tools? Don't send a damned project file. Send SDLXLIFF files and export the relevant TMs to TMX. If you are part of the 1% of Trados users who have a clue what to do with terminology, export the MultiTerm data, if you have any, to a delimited format of some kind. Most tools can take it from there, and you'll get back your finished SDLXLIFF files to review.

Search me!