Oct 5, 2011
Notes from the XLIFF symposium in Warsaw
Some months ago at memoQfest 2011 in Budapest, I was invited to attend the TM Europe conference in Warsaw, Poland, at which the Interoperability Now! initiative would be discussed. Although the initiative and its goals interest me very much, I was undecided about attending until a friend said simply that we should go, and the matter was decided. I had very little time to think about it after that, and in the end I came a day late and left a day later than planned, and two days after that I've still not made it home to my pigeons, Ajax and I having been kidnapped by an alien. My lack of real planning caused me to miss the XLIFF symposium on the first day, about which I heard many things from its participants, mostly positive. As the main conference was the most interesting event of its kind which I have ever attended, most notably for the quality of participants and presenters alike, I was quite intrigued to be told time and again that I should have heard some particular matter at the XLIFF symposium, because the information shared was so much deeper than the particular gold nugget I was admiring.
Thus I asked a new friend if he would be kind enough to share his impressions of the day I missed so that I might enjoy subsequent vicarious attendance and that others might share some of the lessons he took home with him. A single event is many when seen through different eyes, and this is one man's journey through the day. If you attended a different conference, please tell us about yours.
And now, without further ado, I offer you the notes he shared with me, and I thank him for his anonymous insights.
Here are the thoughts on a few of the presentations that struck a note.
The most interesting translation buyer presentation was given by Kirill Soloviev, who runs localisation at Acronis, a small software company. He described the overhaul of his company's software localisation process, from sending a non-standard (simple text-based) resource format to the translation vendor, to converting to XLIFF in-house and sending that instead.
To me, the eye-opening part is not the new, XLIFF-based process, which seems sensible and straight-forward. It's the fact that the old process ever existed in the first place. I know it happens all the time, but it still baffles me that anyone would send an ill-specified in-house file format directly to a translation vendor, with a prayer of getting it back intact. I guess when it comes time to translate, software companies underestimate how fragile their file formats are—or overestimate the engineering competence of their LSP. You can make it work, but (as Kirill showed) at the great price of time, money, and frustration.
The problems were what you would expect. Up-front, the LSP charged engineering for the non-standard files. Acronis was locked into the only LSP that understood them. The files came back broken anyway. Often, this required sending them back to the vendor. The release was delayed unpredictably. Everyone was frustrated.
Kirill quantified the costs (except the frustration), which I appreciate. I don't have the numbers handy, but the engineering cost was the least significant. That alone would not justify the overhaul for years. (For Acronis, that is—given the number of back-and-forths required, the LSP probably lost money on engineering.) Beating lock-in was bigger. Acronis could now shop around. But more important than money was time. Testing corrupt files and back-and-forth with the LSP wasted employees' time and delayed releases. I wouldn't be surprised if they missed ship dates, though Kirill didn't say.
Whose fault was it? The buyer takes some blame for not foreseeing the misinterpretation and corruption of their format. This is typical of the somewhat sloppy internationalisation practices of software developers. But it's understandable, because localisation is not their business. The vendor takes the weight of the blame, in my opinion, for accepting the job without completely understanding the format. LSPs seem to assume it is their problem to deal with random formats, yet they overestimate their technical ability to do it correctly, and fail to set expectations (and pricing) accordingly. If this were estimated diligently before signing the contract, both parties would probably see it's cheaper for Acronis to process their in-house format themselves.
Lesson: The more standard the format you send to the translation vendor, the better results you should expect. XLIFF is the logical extension (in theory at least).
One other observation is that a pure-XLIFF delivery appears to work well for resource files because they have no context to preserve. So converting to XLIFF doesn't lose anything. But this success is somewhat illusory, because (as Kirill acknowledged) the lack of context is a problem for translation quality. This is an important problem that neither Acronis nor XLIFF has solved.
On XLIFF itself there were two panels, which I might title "The Problem" and "The Solution". "The Problem" I might subtitle with a quote from panelist Angelika Zerfass: "What do you mean? What are we supposed to do?" Angelika trotted out a menagerie of freakish XLIFF that would make the stoutest Technical Committee member shiver. Angelika is a localisation trainer, and all of these mutants come from her real-life experiences.
This led to a lively exchange, dominated by echoes that XLIFF is not clearly defined, and implementations regularly get it wrong. Producers see XLIFF as a great place to dump data, without realising how many assumptions they are making about how the data will be treated.
Angelika called for education, guidance, examples, and use cases for XLIFF producers from the standard, and nobody disagreed that this is sorely lacking. The point I would question is her claim that producers need to understand the process that their XLIFF will go through. I believe if XLIFF were well-specified, XLIFF producers could be confident about how their data will be treated—and what they can expect back at the end—while remaining relatively ignorant of the translation process. But this may be hopeful speculation on my part.
Daniel Benito (creator of Déjà Vu) made some points about the lack of workflow information in XLIFF, and claimed that alt-trans was designed without thought for how tools would deal with it. This led to a notable comment from someone not on the panel that XLIFF was designed without tools developers involved at the beginning. That explains a lot.
Shirley Coady (MultiCorpora) noted that the lack of a compliance test for XLIFF allows producers to create whatever they want and call it XLIFF. LSPs (and even tool vendors) seem to accept they are stuck dealing with it. This is the same backwards thinking I pointed out regarding Kirill's talk. LSPs should push back—but they could use a stronger standard, with compliance checking, to back them up. With strong standards, they can show clients that their XLIFF is wrong, and educate them on how to fix it. The world becomes a more harmonious place. (As someone from the audience noted, we could use more translators and PMs in the standards process to bring about this happily ever after.)
"The Solution" was a panel of XLIFF Technical Committee members David Filip, Shirley Coady, and Lucia Morado Vazquez, who explained how XLIFF 2.0 will bring about a golden age of localisation interoperability. OK, so that's a little optimistic, but let's look at what hope they offer for the future.
David cited the new requirement for a conformance clause in OASIS standards as an opportunity. He advocates a conformance clause that mandates processing requirements for every element. With XLIFF 2.0, you either conform or you don't; there is no ambiguity. Everyone seems in favor.
Also widely agreed is that XLIFF 2.0 will have a simpler core, with additional functionality enabled by modules. But the line between core and module has not been drawn. David indicated that the distinction may be guided by considering a set of core use cases. He pointed to the model of the UBL standard, which begins with 40 pages of "business processes" that are supported by the specification.
Some important questions remain. The first is how backwards-compatible XLIFF 2.0 will be. Shirley made a statement implying that 2.0 wouldn't be much of a change for tool developers. But I didn't hear consensus on this point, and it seems at odds with the goals of simplification and tighter specification. An earlier presentation showed parts of the current XLIFF 2.0 schema, and they had clearly diverged from the current standard. So the story on compatibility is unclear. There's a lot of fervor for a fresh start in 2.0. And maintaining backwards compatibility while at the same time removing ambiguity seems a major undertaking. But are implementors willing to rewrite their XLIFF support? I'm not sure anyone knows.
Also missing was any project data on XLIFF 2.0. XLIFF 1.2 was published in February, 2008, over three and a half years ago. Yet there's not much visibility from the outside on the progress towards XLIFF 2.0. David estimated 2.0 in the second quarter of 2012, but I'm not sure how he arrived at this. David did call for feature owners to commit the resources necessary to do the work. And Lucia asked for more industry contribution. But neither answers the question of who is signed up for the core work. I hope I've just missed something, but I'm suspicious of the current dates given the scope of the goals.
Finally, I can't help noting the absence of most of the XLIFF Technical Committee from the Symposium. Not to dismiss the contributions of the others present, but David Filip was the only representative to make many meaningful statements about work being done. Missing were most of the heavy hitters: Committee Chair Brian Schnabel, Yves Savourel, Rodolfo M. Raya, Arle Lommel, Christian Lieske (due to last-minute emergency), and anyone from IBM. I don't mean to blame them. Budgets are tight, and it's hard to justify conference expenses. But it was a lost chance for the TC to communicate its message, and for the industry to engage them. With so much uncertainty around XLIFF 2.0, I think that's a real missed opportunity.
And in the miscellaneous category was a fascinating talk by Ian O'Keeffe. Ian made a compelling case for the increased importance of sound (sound effects, background sound, music) in virtual experiences, and predicted that sound will increasingly be localised to convey the intended meaning to other cultures. He's written a clever tool to vary a piece of music (tempo, pitch, rhythm, ...) and study how listeners from different cultures hear the differences. His provocative idea is that perhaps, instead of localising sound by choosing or composing a different sound, we might modify the original to evoke the right response.
He ties this to localisation and XLIFF by pointing out that whatever method is used, we need to know the intended meaning of a sound in order to localise it. He proposes putting metadata about the source sound into XLIFF. This is all fine, but sound is a red herring. We need translation unit metadata for all sorts of reasons. And we already have it. XLIFF 1.2 even enshrines some rather (ridiculously) specific metadata (coord attribute, anyone?). Moreover, it allows for extension within the
tag, so anybody can put there whatever metadata they want. While there were a few extension-phobes in the audience, I don't see what alternative they could offer, or what harm they see in this kind of use.