Translation Tribulations: The fix is in for PDF charts

Feb 1, 2014

The fix is in for PDF charts

Over four years ago, I reviewed Iceni Infix after I began working with it. I'm not as strong a fan as some, because I generally have little enthusiasm for direct editing of PDFs and dealing with frequent problems such as missing unusual fonts and having to play the guess-my-optimum-font-substitution game, but I do find it useful in many situations. I found another one of those today.

A new client of a friend works with a horrible German program to produce reports full of charts. The main body of the text is written in Microsoft Word and is available as a reasonable DOCX file, but the charts are a problem, as they are available only in the specific, oddball tool or PDF format. Nobody wants to deal with that software, really. It is supported by no translation tools vendor I am aware of, and like another example of incompatible German software, Across, it enjoys the obscurity it deserves.

After thinking about the approach needed in this case, I realized that if the graphics could be isolated conveniently on pages, the XML export from the PDF document would contain only information from the graphics. After translation, the format could be touched up with Infix before making bitmap screenshots at an enlargement which would yield decent resolution when sized in layout. Of course, in projects involving multiple languages the XML files could be used with great convenience.

Selecting and deleting the text on the pages with Iceni Infix is really a no-brainer. The time charge for such work will be quite reasonable. And exporting the XML or marked-up text to translate is also quite straightforward:

The exports can be handled in nearly any CAT tool, so TMS and terminology resources can be put to full use. Or you can edit in a simple, free tool like Notepad++ or an XML-savvy editor.

The screenshot above shows the XML in memoQ. No customization of the default filter is required. Reports from other users who have worked in a similar way indicate that OmegaT and other environments generally have few, if any, problems. In one case there was trouble re-integrating the graphics in a project that also had 50 pages of text, but there may have been other issues I am not aware of in that case.

With the content in the TM, if the chart data are made available in another format, the translations can be transferred quickly to that for even better results. The same approach can be used for a very wide variety of other electronically generated graphic formats (except some of the really insane ones I've seen where the text is broken up; I don't know if Iceni sanitizes such messes or not).

I think this is an approach which can benefit many of us in a variety of projects. It is not really suited for cases of bitmap graphics, but I have other approaches there in which Iceni Infix may also play a useful role and allow CAT integration. Licenses for the tool are quite reasonably priced, and the trial version (in Pro mode) is entirely suited for testing and learning this process.

7 comments:

WasatyFebruary 02, 2014 8:33 AM
I'm using Infix from time to time for over two years now and it's excellent in cases when there's much more form than content, e.g. ads, folders, complicated tables, stuff like that. Unfortunately when working with XML export usually I have to modify the import settings, because some tags for different inline formatting are by default treated as external, which generates wrong segmentation. But other than that it's great.
Of course usually, if a client want's to receive a PDF file back, there's a problem with fonts, because usually folders use proprietary fonts I don't have and/or without full UTF support. But that's another story.
ReplyDelete
Replies
TorstenFebruary 03, 2014 8:01 AM
Sounds good, though I am looking for a Mac alternative for this. Recently I translated a landscape PDF after opening and saving it as LibreOffice (or OpenOffice) Draw (the memoQ ODF filter converts this format without any problem). It worked rather good and might be an alternative for smaller projects (there were some serious, but resolvable issues with carriage returns).
ReplyDelete
Replies
Vaclav BalacekFebruary 18, 2014 1:57 PM
Kevin, thanks for sharing this, it opens incredible new opportunities for us - LSPs who have been struggling with PDFs sent by customers which had to be OCRed (a cost no customer would ever be willing to pay for because they don't understand WHY you need to play with their files in such a way, why you just don't TRANSLATE them :-)).
I have done a little bit of testing on some short files - forms, for example, and it worked excellent. Have you got experience with processing longer and more complicated files using infix?
ReplyDelete
Replies