Oct 11, 2010

Kirtee's TAUS review and the SDL APIs

Kirtee Vashee recently posted an interesting summary of the TAUS annual conference which he attended. I myself have a hard time getting behind the TAUS goal of sharing massive amounts of data; it's sort of like asking your neighbors to swap garbage cans and contents. As many are discovering (sometimes painfully late), data quality actually matters. The amount of garbage I see in supplied TMs give me about the same enthusiasm for sharing on a large scale that a swim in the Danube would inspire right now. Predictably, MT was also a big focus at the conference this year. Those who believe that MT will soon displace the professional translator might enjoy the recent post on Machine Translation and the Philosopher's Stone.

Some of Kirtee's comments referred to the "walled garden" of SDL technology and the lack of openness and high cost of its API. I was a bit puzzled by this, as I had heard other things for a while from different sources, and my comment (in which discerning readers will note that my "n" key still hasn't been fixed)
was met with a invitation to SDL to clarify the issue. I think this clarification is forthcoming. As I have been given to understand
All that is needed is a license and then you have access to the API’s and the fully documented, and regularly updated, online SDK.  Just apply to the developer program, free of charge, and you’ll get the details.  This applies to desktop and server. 
Being out of the development game for about 8 years now, I can't comment on the quality or versatility of anyone's APIs except in the most general way. Nowadays I feel a sense of victory if I waste a day writing a WSH script for a data transformation that should reasonably have taken me an hour. But I think it is still fair to say that all tool vendors have a long way to go for interoperability and that even the best APIs need to be expanded. I won't be satisfied until I see Open Source clients capable of connecting to and working with the Ontram, SDL, Kilgray, Atril and other servers. The alternative is that those of us working with those servers will have to deal with the nonsense of keeping track of the functions and changes in all these environments. This is certainly not in the spirit of Saint Ludd, the patron of today's frustrated technovictim translators.


  1. Hi Kevin,

    I imagine you'd get your wish if the open source clients were as functional as their commercial counterparts, and if all commercial tools published their API's.

    I think XLIFF and other standard file formats are par for the course today (or they should be) but not everyone has taken the same leap as SDL with improving the ability to interact with the applications themselves.

  2. "Standard"? That's part of the problem, Paul. Standards are not implemented in a standardized way, and Kirtee and others are right to complain about this, even if they are sometimes off the mark with statements about specific issues with particular companies.

    Let's take XLIFF and TMX. At the memoQ Fest 2010 (where I hope to meet you in person at last next year), Angelika Zerfass did a marvelous seminar on data migration and the interoperability of tools - so good in fact that I ditched my plans for a technical presentation on the same subject a few days later, because most of what could be said was in her talk. She did a detailed analysis of the structure of TMX and XLIFF as implemented by SDL, Kilgray and Atril. While at a basic level the data could be exchanged safely, there were serious issues of information loss in nearly every scenario.

    With regard to TMX information, the way in which Kilgray embeds context information makes it potentially accessible in other tools (with adaptations in the software). Not a chance in the other cases. I don't think anybody's XLIFF is marked as translated after an exchange workflow.

    These little things should not be that hard to coordinate. I know you have made some exemplary efforts at outreach and communication with other vendors, but policy issues like this need more than one white knight and his horse. Maybe all of you should gather in Barcelona in the crappy dead of winter this year, thaw out and get some better cooperation for interoperability going so everyone can enjoy its full benefit.

    I have a suspicion that improved interchange and interoperability will grow the market to the extent that any shifts in market share will not be as significant as the fact that the absolute sales numbers for all decent providers should increase quite a bit.

  3. With regard to the "information loss" I mentioned in my previous comment, this loss is not always inevitable, but sophisticated data mapping is often required. If the teams at major vendors would get their acts together and cooperate on these issues, we wouldn't have to play these games and users would be much happier with their tools.

  4. Hear, hear! Also see my presentation on extensibility in XLIFF to see how differently XLIFF is being implemented across vendors.

  5. Perhaps the better route is to extend the XLIFF specification rather than have groups of vendors working in isolation; the major vendors standard (MVS), the minor vendors standard (er.. MVS) etc. Who are these vendors and who decides which club they belong to?

    Your example refers to a feature that isn't even covered in the XLIFF standard (segment status) so vendors are forced to do their own thing with custom extensions - allowable in the spec.

    I think people all have their preferred way of working and one tool for all is unlikely to do the job... we shouldn't remove the competitive aspects for the consumer anyway (nor the right of a Company to make money by trying to develop a competitive advantage through doing something none of the rest can) and open source client tools have just too far to go before they come close to tools like SDL Studio, Atril DVX or memoQ.

    I think sticking to the standard and being able to ensure at least a lossless roundtrip is the way to go. It all extends a lot further than just translation interoperability to make the cradle to grave process of information management really smooth and I doubt many vendors, outside of SDL, give this problem much thought at all. So reaching agreement in our MVS would probably take some doing.

  6. @Paul
    Extending the XLIFF specification is indeed the way to go. I don't blame any vendor for using the extensibility feature, I blame the standard for being too loose because of it.

    There are actually 2 standard ways to define segment status in XLIFF and they even allow user-defined values: the attributes 'state' and 'state-qualifier' to the 'target' element.

  7. @Thomas
    Indeed, and I think you have again demonstrated this problem very well.

    I'm not an expert on the XLIFF standard so don't want to get into a debate I would need constant help for. But, a Translation Unit for us is really a Paragraph. This means it can contain many Segments and we need to hold a different status against each one. As far as I am aware the only allowable way to do this is with the mrk element in seg-source and in target, and this has no mechanism for status at all. So, we're back to extensibility and we use the standard to create allowable methods of handling status, and many other things against each in this way.


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)