May 19, 2012

Dissecting SDL Trados Studio project files (SDLPPX) for translation with other tools

When a translation request with an SDLPPX (SDL Trados Studio project file) shows up in my inbox, it's always a bit irritating. The current version 5 of memoQ can't do a thing with these project files, unlike those from Star Transit, where a nicely automated wizard sets up a memoQ project with everything I need except terminology. To translate the content (SDLXLIFF files) of an SDL project file, you have to take the thing apart.

Of course, if you own an SDL Trados Studio license, it's usually a simple matter to open the package with Trados and export the resources you need. But today that didn't work. An error message informed me that the PPT source file for one of the SDLXLIFF resources was missing. Indeed. It was sitting on an FTP server to which the PM had failed to give me the access data before the weekend. Looked like I was SOOL.

In the past, when I took these SDLPPX file apart manually to get at the components I wanted, my luck was mixed. These are just ZIP files, so if you take a project file named MyWonderfulSubcontractedJob.sdlppx and rename it MyWonderfulSubcontractedJob.zip you can unpack it with WinZip or other utilities. Inside the ZIP file, the structure will look something like this:
Inside an SDL Trados Studio project package with Source language German (DE) and target language English (UK)
Both the source and target language folders contain an SDLXLIFF file with the source content. But there's a catch. You must take the SDLXLIFF file from the target folder.

Here's an example of a translation segment from the SDLXLIFF in the source fiile:
<trans-unit id="4e4fc380-8fac-4570-942b-a4bf6c4a4c7f"><source>Die neue Maschinenrichtlinie</source><seg-source>Die neue Maschinenrichtlinie</seg-source></trans-unit>
Notice anything missing? There is no tag set for target content. This is essentially a monolingual file. When imported into memoQ it will show zero segments! A look at the same translation unit in the SDLXLIFF file out of the target language folders shows the difference (a bit more than just the target tags highlighted):
<trans-unit id="4e4fc380-8fac-4570-942b-a4bf6c4a4c7f"><source>Die neue Maschinenrichtlinie</source><seg-source><mrk mtype="seg" mid="560">Die neue Maschinenrichtlinie</mrk></seg-source><target><mrk mtype="seg" mid="560" /></target><sdl:seg-defs><sdl:seg id="560" /></sdl:seg-defs></trans-unit>
This second SDLXLIFF file will import fine into other tools like memoQ using the XLIFF filter and allow you to translate without difficulty. I had not noticed this before, because in the past, if the SDLXLIFF file I imported had no segments, I just opened it in SDL Trados Studio, copied source to target and resaved it, and the resultant file imported without trouble and showed all segments. It took a missing original file that Studio demanded to save changes for me to look at matters a bit more closely.

I really do hope that a future version of memoQ will include a project import routine for these SDL projects similar to that for Star Transit projects. I am encountering SDLPPX files with increasing frequency due to the general lack of understanding interoperable workflows by those living in the Trados ghetto, and this added functionality in my primary tool would be a great help.

What should an SDL Trados Studio user do to ensure a less troublesome collaboration with those who use other tools? Don't send a damned project file. Send SDLXLIFF files and export the relevant TMs to TMX. If you are part of the 1% of Trados users who have a clue what to do with terminology, export the MultiTerm data, if you have any, to a delimited format of some kind. Most tools can take it from there, and you'll get back your finished SDLXLIFF files to review.

13 comments:

  1. Hi Kevin,

    Interesting article. I think you missed an easy route for interoperability with most tools from Studio and not just those able to "read" XLIFF. Use the SDLXLIFF to Legacy Converter that will convert the entire Project into TTX or Bilingual Doc(x). These have been around so long most tools can handle them.

    An added advantage of this method is that you get to not export locked segments, or specific statuses you are not interested in and this can be useful since memoQ won't respect these anyway and you risk working on something that wasn't intended.

    You can then use the same tool to update the SDLXLIFF with the TTX or Bilingual doc(x) when you're done.

    This tool is brilliant because it helps to work around so many of the failings of XLIFF in terms of how it is interpreted by each vendor.

    Just one other comment I would make is on your view of less troublesome collaboration. The only way to ensure this 100% is to use Studio..! Anything else is always troublesome to some degree and potentially time wasting for the client. The same probably goes the other way too when starting life with any other tool. This type of interoperability where the tools cannot handle things properly for you is asking for trouble unless you are really comfortable with the process and the implications in both tools.

    Regards

    Paul

    ReplyDelete
  2. The problem is that the SDLPPX “transports” so much more than just the translatable files and the translation memory (and the termbase if in use). It can also include “project metadata” like deadlines, comments/instructions from the PM to the translator, QA settings, verifyer settings, in case of XML files also things like embedded DTDs/Schemas and along with them validation settings or XSLT(s) for xml preview settings. Also Studio can add multiple TMs to a project, arrange them with priorities and penalties and specifiying which of those TMs to update with your translations and so on. Not to mention AutoSuggest dictionaries, custom spell checkers, custom RegEx checkers etc.
    Not all of this will be used in all projects, obviously. But the danger to miss them and run into serious problems should not be underestimated. So, unless there is a proper SDLPPX import/export mechanism, just extracting the SDLXLIFF and the TM may fit in some cases. But especially when it’s about XML files, it should be discussed in detailed with the PM (or his/her Language Engineer) to cover all requirements like xml validation properly.

    ReplyDelete
  3. Kevin, the code does not appear in your samples. Maybe you can "mask" the code? That is write all < as &lt; and all > as &gt; Than it should show up.

    ReplyDelete
  4. Thanks for the heads-up on the display problem, Stefan. I wrote the post in a hurry before a meeting and overlooked that entirely. "Publish in haste, repent at leisure...."

    ReplyDelete
  5. @Paul: Although your "legacy converter" sounds like an excellent tool, I have serious doubts that converting XLIFF files to ancient proprietary Trados formats is really a sustainable way forward, especially as SDL has announced the "death" of SDL Trados 2007 technology with the next major release of Studio. Moreover, the need to own a license to SDL Trados Studio to make use of this option will pose a barrier to a great many of those who need it. And in the specific case I mentioned here, I wonder if the missing source file would cause trouble; I did in fact open the package with my copy of SDL Trados Studio and I was unable to save files externally. Despite the sophistication of the current generation of SDL's technology, there are simply too many bugs and quirks dealing with it in routine cases I encounter. The added complexity of things like workarounds required to import many simple XLIFF files to Studio drives me to despair at times, though it does give me the opportunity to earn a bit with consulting. But the issues you have pointed out with file compatibility must be taken seriously (on all sides), and I hope that SDL and others will reconsider their perceived proprietary interests and cooperate better to ensure more painless collaboration between translators and translation clients using various tools.

    @Stefan: Most of that ancillary data is of no interest to me. I don't need SDL's AutoSuggest dictionaries to translate the XLIFF file. Nor do I particularly care about the internal QA settings; I have my own QA profiles configured in memoQ, and I doubt that my clients using SDL Trados Studio even understand how to use those features given some of the small issues they stumble over routinely trying to use that environment. The instructions for the job are usually clear in my dealings, typically something like "translate this manual by the end of next week, please". Typically I have received these packages without them even containing a TM worthy of note, as I have maintained my own TMs for those clients for years, and mine are often much cleaner than the version maintained by the client. The whorehouse TMs one often finds at agencies, where any number of translators have had their way with the data, do not inspire me to give them any priority over my own resources. In most cases I have encountered to date, the main need is to get at a translatable file, the SDLXLIFF from the target language folder. The case of a sophisticated client package is an interesting one in theory, but the real cases I encounter can typically be reduced to a very practical solution with just a bit more communication required if additional data are needed. The greatest need for a sophisticated client is to find a viable way of collaborating with a translator having the right knowledge of the subject matter, and that should be done with an environment most ergonomically functional for the person translating.

    If an XML file has been imported to SDL Trados Studio with the appropriate settings, the resultant SDLXLIFF should not require this information to be translated in another environment. I can trick out previews with an XSLT transform myself if I really feel the need, but usually a decent PDF to show me the layout of the original will suffice for context.

    Everyone can come up with endless excuses why Tool X absolutely must be used for a particular task, but I seldom find that these arguments hold for typical cases. They are mostly straw arguments heaped under the stake to which some would see interoperability and user-friendly collaboration tied and burned. Translators and their customers deserve better than that.

    ReplyDelete
  6. Hi Kevin,

    Of course it isn't the way forward. The way forward is for memoQ to handle the Studio flavour properly and when going the other way for an interested party, so someone who wants to use Studio for memoQ files, to use the OpenExchange to handle these filetypes properly. I seperate out resposibilities purely because SDL have provided a platform to enable this this. Of course you can always continue with complex CAT hopping scenarios attempting to successfully (and sometimes unsuccessfully) manually edit the files at the cost of the unhappy end client who wanted the work completed using the same tool they used in the first place... but dare I suggest that the vast majority of translators probably cannot do this properly?

    The "legacy converter" is a useful workaround because I think other tools handle TTX or Bilingual doc better than they handle someone elses XLIFF... but you're right it's not the way forward.

    It's interesting that the information posted by Stefan is of no interest for you. I think for most translators it should be as my view is that they should be working to the instructions of their customer rather then forcing their preferred workflow, possibly unbeknown to their customer. How many times do we see an advert saying "Must use Studio 2011" or "Must use... some other tool" only to see a post on a forum from a translator asking whether they can use their preferred tool instead? I think that without the knowledge you clearly possess this is a very undesirable situation because at the very least you need to put the file back into the originating tool and make sure that the resultant file complies with the instructions built into the package in the first place which at best means additional work for someone, and at worst means the package can't be successfully completed or the target file saved.

    The real answers are to either use the same tool in the first place, or improve the ability of the tool you wish to use to handle the specific flavour of the others. The reason I think the latter is because in my opinion a simplified version of XLIFF to enable better interoperability will probably never be simplified and we'll end up with yet another "standard", and even if we do it will only be adequate for some workflows due to the inherent limitations of a "simplified" standard... so not really a "standard"... just another filetype.

    I totally agree that customers deserve more... but waiting for agreement on "standards" is not it... delivering tools to manage the complexities of the differences is. Might as well use TTX ;-)

    Regards

    Paul

    ReplyDelete
  7. Susan StarlingMay 22, 2012 4:04 PM

    Hi Paul, Hi Kevin,

    Interesting discussion. Without going into great detail here, I have to wonder why it doesn't seem acceptable to Paul for translators to "force their preferred workflows" on customers, but it's apparently fine for clients to do exactly that. Why? I am a service provider, yes, and as such I of course need to take my customers' wishes into account. But they also need to respect my workflows. Over the years I have developed processes that enable me to do my job properly and efficiently, and the customer can best benefit from the particular services I can provide when I'm able to work as I see fit, in the tool or tools of my choice and using the resources I've maintained. This is why the customers need to provide interoperable data that can be read into any tool rather than obscure packages only usable in one tool, since Paul is of course correct in that unlike Kevin, most translators will not be able to properly extract the relevant data from these packages. And in fact all of my customers do provide the necessary data in a manageable form.

    According to Paul the answer is to either use the same tool as the customer, or improve the ability of the specific tools to deal with each others' formats. But here's a probably naive thought - why not do what the rest of the software industry has been doing for years and make applications - in this case anything that would be included in a proprietary package such as TMs, TBs, etc. - completely interchangable between tools?

    I'm glad to hear that Paul agrees with Kevin's statement that customers deserve more but he left out a small part of that statement, as Kevin wrote that "*translators and* customers deserve more". :)

    Interoperability is the art of compromise, as our friend István put it. My addition to that is that compromise goes both ways.

    Best regards,

    Susan

    ReplyDelete
  8. Hi Susan,

    I think I have no problem with anyone working to an agreed workflow. But I guess I've taken the view in this discussion that if I was a customer, and I spent time preparing my projects and making sure that all the checks and balances I wanted were in place, then I would indeed be pretty annoyed if someone decided to ignore all of this and do their own thing and as a result cause me problems.

    This is not the same as agreeing beforehand that we do something a little different because I want your services and am happy to accept you using your preferred workflow. In this case it's ok for me. Doing it unbeknown to a customer who specifically asked, and pays, for something else is not.

    In the absence of anything better then compromise is fine... but this is not the scenario I was discussing. Why compromise when you can facilitate something better that removes the need in the first place? This is exactly where we are coming from. As for completely interchangeable... I think one thing that sets this industry apart from all others is how much dependency there is on a multitude of software applications that are not CAT tools at all... so plugins, source formats, specific workflows that use these formats in a special way, server based resources etc. Each of these do their own thing, even if there are a few that are interchangeable with their neighbours, and in turn they can all be used for different types of work and workflows. CAT tool vendors try to support their customers in their endeavours to maximise their business goals and this inevitably results in "non standard" applications that don't collaborate very well. I think at a very basic level we can collaborate, but I doubt we'll ever do this completely... at least not until there is only one of us left ;-)

    But one thing I missed out is that this discussion will probably never conclude because we'll all be working online before it does... and then you'll have no choice at all (other than don't take the work)... the customers provide the reason for you being there in the first place :-)

    Regards

    Paul

    ReplyDelete
  9. Oh well, here I am with a project created on Trados. I wanted to import it in memoQ because I translated part of it in the past and have the TMs and TBs for it. But I can't import it on memoQ and of course, I am part of the 99% of translators who can't make MultiTerm work. In my view, it's inconceivable working on a project without a term base. Try as might, I can't work with MT - and I am not a technophobe or computer illiterate.

    But there is hope. The PM told me they are migrating to memoQ and it will be sooner rather than later. Maybe they are listening to us. So maybe I will have to bite the bullet and work on 2011. Not that I want it, but I don't have much time to fiddle about converting and reconverting only to be met in the end with unusable files. Unfortunately deadlines are our enemies in cases such this.

    I agree with Susan that it's high time CAT companies start working together like other software developers. If a Mac can deal with Windows files and vice-versa, why can't we have this in the translation world? Open standards, that's what we need. Not closed and byzantine systems that only the developers and a couple of blessed few understand.

    ReplyDelete
  10. Just found this and it worked using MateCat - I suggest letting the company know what they will get back before agreeing, formatting with pdf was a bit awkward though :(
    however, how do you get it back into a sdlppx format though?

    ReplyDelete
    Replies
    1. The return format isn't SDLPPX, it's SDLRPX for a package. This article is a bit dated; in the meantime memoQ and some other tools read the SDL packages (and other package types) directly and produce return packages as well. I've seen an unfortunate tendency to "overdevelop" these compatibility features without sufficient testing of function sometimes, so ALWAYS do a roundtrip test before you take on a job to ensure that everything will actually work.

      Delete
  11. What if there would be a tool which converts all formats (sdlppx, ttx, PXF, fm, indd etc.) to 2-column tabled simple rtf or Microsoft docx file, while keeping the most important features of SDL Studio? You could simulate work in Studio on Mac.

    P.s. The vision where we all have to translate online scares me to hell.

    ReplyDelete
    Replies
    1. Now it is ready: dataxsl (c o m) - we have been using it for some time for colleagues who do not have Trados or Studio.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)