Showing posts with label charts. Show all posts
Showing posts with label charts. Show all posts

Feb 1, 2014

The fix is in for PDF charts

Over four years ago, I reviewed Iceni Infix after I began working with it. I'm not as strong a fan as some, because I generally have little enthusiasm for direct editing of PDFs and dealing with frequent problems such as missing unusual fonts and having to play the guess-my-optimum-font-substitution game, but I do find it useful in many situations. I found another one of those today.

A new client of a friend works with a horrible German program to produce reports full of charts. The main body of the text is written in Microsoft Word and is available as a reasonable DOCX file, but the charts are a problem, as they are available only in the specific, oddball tool or PDF format. Nobody wants to deal with that software, really. It is supported by no translation tools vendor I am aware of, and like another example of incompatible German software, Across, it enjoys the obscurity it deserves.

After thinking about the approach needed in this case, I realized that if the graphics could be isolated conveniently on pages, the XML export from the PDF document would contain only information from the graphics. After translation, the format could be touched up with Infix before making bitmap screenshots at an enlargement which would yield decent resolution when sized in  layout. Of course, in projects involving multiple languages the XML files could be used with great convenience.

Selecting and deleting the text on the pages with Iceni Infix is really a no-brainer. The time charge for such work will be quite reasonable. And exporting the XML or marked-up text to translate is also quite straightforward:


The exports can be handled in nearly any CAT tool, so TMS and terminology resources can be put to full use. Or you can edit in a simple, free tool like Notepad++ or an XML-savvy editor.



The screenshot above shows the XML in memoQ. No customization of the default filter is required. Reports from other users who have worked in a similar way indicate that OmegaT and other environments generally have few, if any, problems. In one case there was trouble re-integrating the graphics in a project that also had 50 pages of text, but there may have been other issues I am not aware of in that case.


With the content in the TM, if the chart data are made available in another format, the translations can be transferred quickly to that for even better results. The same approach can be used for a very wide variety of other electronically generated graphic formats (except some of the really insane ones I've seen where the text is broken up; I don't know if Iceni sanitizes such messes or not).

I think this is an approach which can benefit many of us in a variety of projects. It is not really suited for cases of bitmap graphics, but I have other approaches there in which Iceni Infix may also play a useful role and allow CAT integration. Licenses for the tool are quite reasonably priced, and the trial version (in Pro mode) is entirely suited for testing and learning this process.

Jul 17, 2013

How would you translate the chart in this DOCX file?

Can anyone tell me quickly the best way to translate the chart in this DOCX file? Or how to get an accurate word count of the words to be translated in the file?

*****

I love to see the different approaches people take to this problem. It's one which I think is encountered with some frequency by translators, and in the past I too many different approaches to it - long ago I usually did something involving PDF conversion, editing of the PDF and making a screenshot. But that is inefficient and doesn't allow the use of CAT tools.

Yesterday I picked up a project with 18 of those silly charts embedded in it. A real nuisance. Here's what happens if you try to edit one of those charts in situ:


Hopeless, right? A lot of very authoritative web pages make it clear that without having the linked Excel files, you cannot modify the text. Not true, actually. With or without hints, a number of technically versatile colleagues found ways to solve he problem or at least made close guesses. Some of these are here in the comments. One very interesting exchange on Twitter showed than somehow the settings of the OmegaT import filters can be tweaked to solve this:




The thing about OmegaT is that it's sort of geeky - the solution looks pretty good here, but I can't actually make it work myself.

The solution I worked out last night is very similar to the one described by Stanislas in the comments.
  1. Change the file extension to ZIP
  2. Look inside the ZIP file with Windows Explorer or another suitable tool as described in other blog posts.
  3. Inside the "word" subfolder there is a folder named "charts". It contains XML data with all the chart headings, numbers and labels. Copy it.
  4. Paste a copy of the folder where you want your source files. Import the chart XML files into any CAT tool or XML editor. It's a good idea to configure a filter to exclude and protect the references to the original Excel files with the data. (Though I am curious whether deliberately spoiling these data can protect against the unwanted update that one person worried about in the comments. I'll have to try that.)
  5. When the translations are completed, paste the XML files back inside the charts folder in the file structure.
  6. Rename the extension back to what it was at the start (DOCX in this case). You're done. No refresh necessary (unlike with embedded Excel or PowerPoint objects).



A memoQ filter configuration for these XML files can now be found on Kilgray's Language Terminal.