Oct 5, 2011

Notes from the XLIFF symposium in Warsaw

Some months ago at memoQfest 2011 in Budapest, I was invited to attend the TM Europe conference in Warsaw, Poland, at which the Interoperability Now! initiative would be discussed. Although the initiative and its goals interest me very much, I was undecided about attending until a friend said simply that we should go, and the matter was decided. I had very little time to think about it after that, and in the end I came a day late and left a day later than planned, and two days after that I've still not made it home to my pigeons, Ajax and I having been kidnapped by an alien. My lack of real planning caused me to miss the XLIFF symposium on the first day, about which I heard many things from its participants, mostly positive. As the main conference was the most interesting event of its kind which I have ever attended, most notably for the quality of participants and presenters alike, I was quite intrigued to be told time and again that I should have heard some particular matter at the XLIFF symposium, because the information shared was so much deeper than the particular gold nugget I was admiring.

Thus I asked a new friend if he would be kind enough to share his impressions of the day I missed so that I might enjoy subsequent vicarious attendance and that others might share some of the lessons he took home with him. A single event is many when seen through different eyes, and this is one man's journey through the day. If you attended a different conference, please tell us about yours.

And now, without further ado, I offer you the notes he shared with me, and I thank him for his anonymous insights.


Here are the thoughts on a few of the presentations that struck a note.


The most interesting translation buyer presentation was given by Kirill Soloviev, who runs localisation at Acronis, a small software company.  He described the overhaul of his company's software localisation process, from sending a non-standard (simple text-based) resource format to the translation vendor, to converting to XLIFF in-house and sending that instead.

To me, the eye-opening part is not the new, XLIFF-based process, which seems sensible and straight-forward.  It's the fact that the old process ever existed in the first place.  I know it happens all the time, but it still baffles me that anyone would send an ill-specified in-house file format directly to a translation vendor, with a prayer of getting it back intact.  I guess when it comes time to translate, software companies underestimate how fragile their file formats are—or overestimate the engineering competence of their LSP.  You can make it work, but (as Kirill showed) at the great price of time, money, and frustration.

The problems were what you would expect.  Up-front, the LSP charged engineering for the non-standard files.  Acronis was locked into the only LSP that understood them.  The files came back broken anyway.  Often, this required sending them back to the vendor.  The release was delayed unpredictably.  Everyone was frustrated.

Kirill quantified the costs (except the frustration), which I appreciate.  I don't have the numbers handy, but the engineering cost was the least significant.  That alone would not justify the overhaul for years.  (For Acronis, that is—given the number of back-and-forths required, the LSP probably lost money on engineering.)  Beating lock-in was bigger.  Acronis could now shop around.  But more important than money was time.  Testing corrupt files and back-and-forth with the LSP wasted employees' time and delayed releases.  I wouldn't be surprised if they missed ship dates, though Kirill didn't say.

Whose fault was it?  The buyer takes some blame for not foreseeing the misinterpretation and corruption of their format.  This is typical of the somewhat sloppy internationalisation practices of software developers.  But it's understandable, because localisation is not their business.  The vendor takes the weight of the blame, in my opinion, for accepting the job without completely understanding the format.  LSPs seem to assume it is their problem to deal with random formats, yet they overestimate their technical ability to do it correctly, and fail to set expectations (and pricing) accordingly.  If this were estimated diligently before signing the contract, both parties would probably see it's cheaper for Acronis to process their in-house format themselves.

Lesson: The more standard the format you send to the translation vendor, the better results you should expect.  XLIFF is the logical extension (in theory at least).

One other observation is that a pure-XLIFF delivery appears to work well for resource files because they have no context to preserve.  So converting to XLIFF doesn't lose anything.  But this success is somewhat illusory, because (as Kirill acknowledged) the lack of context is a problem for translation quality.  This is an important problem that neither Acronis nor XLIFF has solved.


On XLIFF itself there were two panels, which I might title "The Problem" and "The Solution".  "The Problem" I might subtitle with a quote from panelist Angelika Zerfass: "What do you mean?  What are we supposed to do?"  Angelika trotted out a menagerie of freakish XLIFF that would make the stoutest Technical Committee member shiver.  Angelika is a localisation trainer, and all of these mutants come from her real-life experiences.

This led to a lively exchange, dominated by echoes that XLIFF is not clearly defined, and implementations regularly get it wrong.  Producers see XLIFF as a great place to dump data, without realising how many assumptions they are making about how the data will be treated.

Angelika called for education, guidance, examples, and use cases for XLIFF producers from the standard, and nobody disagreed that this is sorely lacking.  The point I would question is her claim that producers need to understand the process that their XLIFF will go through.  I believe if XLIFF were well-specified, XLIFF producers could be confident about how their data will be treated—and what they can expect back at the end—while remaining relatively ignorant of the translation process.  But this may be hopeful speculation on my part.

Daniel Benito (creator of Déjà Vu) made some points about the lack of workflow information in XLIFF, and claimed that alt-trans was designed without thought for how tools would deal with it.  This led to a notable comment from someone not on the panel that XLIFF was designed without tools developers involved at the beginning.  That explains a lot.

Shirley Coady (MultiCorpora) noted that the lack of a compliance test for XLIFF allows producers to create whatever they want and call it XLIFF.  LSPs (and even tool vendors) seem to accept they are stuck dealing with it.  This is the same backwards thinking I pointed out regarding Kirill's talk.  LSPs should push back—but they could use a stronger standard, with compliance checking, to back them up.  With strong standards, they can show clients that their XLIFF is wrong, and educate them on how to fix it.  The world becomes a more harmonious place.  (As someone from the audience noted, we could use more translators and PMs in the standards process to bring about this happily ever after.)


"The Solution" was a panel of XLIFF Technical Committee members David Filip, Shirley Coady, and Lucia Morado Vazquez, who explained how XLIFF 2.0 will bring about a golden age of localisation interoperability.  OK, so that's a little optimistic, but let's look at what hope they offer for the future.

David cited the new requirement for a conformance clause in OASIS standards as an opportunity.  He advocates a conformance clause that mandates processing requirements for every element.  With XLIFF 2.0, you either conform or you don't; there is no ambiguity.  Everyone seems in favor.

Also widely agreed is that XLIFF 2.0 will have a simpler core, with additional functionality enabled by modules.  But the line between core and module has not been drawn.  David indicated that the distinction may be guided by considering a set of core use cases.  He pointed to the model of the UBL standard, which begins with 40 pages of "business processes" that are supported by the specification.

Some important questions remain.  The first is how backwards-compatible XLIFF 2.0 will be.  Shirley made a statement implying that 2.0 wouldn't be much of a change for tool developers.  But I didn't hear consensus on this point, and it seems at odds with the goals of simplification and tighter specification.  An earlier presentation showed parts of the current XLIFF 2.0 schema, and they had clearly diverged from the current standard.  So the story on compatibility is unclear.  There's a lot of fervor for a fresh start in 2.0.  And maintaining backwards compatibility while at the same time removing ambiguity seems a major undertaking.  But are implementors willing to rewrite their XLIFF support?  I'm not sure anyone knows.

Also missing was any project data on XLIFF 2.0.  XLIFF 1.2 was published in February, 2008, over three and a half years ago.  Yet there's not much visibility from the outside on the progress towards XLIFF 2.0.  David estimated 2.0 in the second quarter of 2012, but I'm not sure how he arrived at this.  David did call for feature owners to commit the resources necessary to do the work.  And Lucia asked for more industry contribution.  But neither answers the question of who is signed up for the core work.  I hope I've just missed something, but I'm suspicious of the current dates given the scope of the goals.

Finally, I can't help noting the absence of most of the XLIFF Technical Committee from the Symposium.  Not to dismiss the contributions of the others present, but David Filip was the only representative to make many meaningful statements about work being done.  Missing were most of the heavy hitters:  Committee Chair Brian Schnabel, Yves Savourel, Rodolfo M. Raya, Arle Lommel, Christian Lieske (due to last-minute emergency), and anyone from IBM.  I don't mean to blame them.  Budgets are tight, and it's hard to justify conference expenses.  But it was a lost chance for the TC to communicate its message, and for the industry to engage them.  With so much uncertainty around XLIFF 2.0, I think that's a real missed opportunity.


And in the miscellaneous category was a fascinating talk by Ian O'Keeffe.  Ian made a compelling case for the increased importance of sound (sound effects, background sound, music) in virtual experiences, and predicted that sound will increasingly be localised to convey the intended meaning to other cultures.  He's written a clever tool to vary a piece of music (tempo, pitch, rhythm, ...) and study how listeners from different cultures hear the differences.  His provocative idea is that perhaps, instead of localising sound by choosing or composing a different sound, we might modify the original to evoke the right response.

He ties this to localisation and XLIFF by pointing out that whatever method is used, we need to know the intended meaning of a sound in order to localise it.  He proposes putting metadata about the source sound into XLIFF. This is all fine, but sound is a red herring.  We need translation unit metadata for all sorts of reasons.  And we already have it. XLIFF 1.2 even enshrines some rather (ridiculously) specific metadata (coord attribute, anyone?).  Moreover, it allows for extension within the tag, so anybody can put there whatever metadata they want.  While there were a few extension-phobes in the audience, I don't see what alternative they could offer, or what harm they see in this kind of use.

1 comment:

  1. I am very pleased to see such a complete and compelling report on the very successful XLIFF Symposium. This piece is very educational to me and I find it very useful. I, as one of the so called "heavy hitters" missing from the event, value this, and the many other reports I've received. I won't try to make excuses for my absence, or for the absences of the other members of the Technical Committee. In my case, let me just sum it up by saying, were I given the chance, I would have tried to move heaven and earth to attend - but this time, the chance was not there for me. I'll let the other cited missing members speak for themselves.

    But what I will comment on, and I hope this comes across as very positive news, is the role the XLIFF TC had in preparing for the event; and perhaps more importantly, the action set into motion as a result of the symposium. Behind the scenes the TC played a very active role in helping to design, plan, recruit, promote, and validate all aspects of the event. Several of us participated on the event team. The volunteers in our TC spent many hours of TC time, and many hours of private time to shepherd this event toward success.

    And as for what good comes afterward, let me start by talking about what we did after the first symposium held a year earlier in Limerick. After the event we spent many cycles harvesting the information we had gathered. We called it gathering the voice of our community. From this event was born the Promotion and Liaison Sub Committee. Members have created summary papers and have given debriefs at our TC meetings. We've published articles on the steps we are taking based on inputs from the first symposium. Several key concepts currently being developed for XLIFF 2.0 were directly born from inspiration and the voice of the community gathered in Limerick (such notions as core vs. extended modules, reexamination of extensibility, gap analysis between what is specified vs. what the tool makers are implementing via extension points, etc.). We heard loudly and clearly what's wrong with XLIFF (and what's right with XLIFF), and we seized upon it. Now, a year later, we have our second symposium under our belt, a fresh set of reports and resources (this blog entry notwithstanding), and renewed enthusiasm and gratitude for another dose of the voice of our community. We've had similar cycles spent on the analysis already. I know that we will harvest every bit we can and turn what we get into action and goodness.

    Rest assured, we value these symposiums very much, and look forward to harnessing what we learn. So finally, I express immense gratitude to all who planned, supported, attended, and reported upon the 2nd Annual XLIFF Symposium in Warsaw!

    - Bryan Schnabel
    Chair, OASIS XLIFF Technical Committee


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)