Jun 24, 2017
The other sides of Iceni in Translation
The integration of the online TransPDF service from Iceni in memoQ 8.1 has raised the profile of an interesting company whose product, the Infix PDF Editor, has been reviewed before on this blog. TransPDF is a free service which extracts text content from PDF files, converts it to XLIFF for translation in common translation environments, and then re-integrates the target text from the translated XLIFF to create a PDF file in the target language.
This is a nice thing, though its applicability to my personal work is rather limited, as not many of my clients would be enthusiastic if I were to send PDF files as my translation results. Sometimes that fits, sometimes not. And of course, some have raised the question of whether using this online service is compatible with some non-disclosure restrictions.
I think it's a good thing that Kilgray has provided this integration, and I hope others follow suit, but for the cases where TransPDF doesn't meet the requirements of the job, it is useful to remember Iceni's other options for preparing text for translation.
Translatable XML or marked-up text export
As long as I can remember, the Infix PDF Editor has offered the option to export text on your local computer (avoiding potential non-disclosure agreement violations) so that it can be translated and then re-imported later to make a PDF in the target language. Only the location of this option in the menus has changed: the menu choices for the current version 7 are shown below.
This solution suffers from the same problem as the TransPDF service: not everyone will be happy with the translation in PDF, as this complicates editing a little. However, I find the XML extract very useful to put the content of PDF files into a LiveDocs corpus for reference or term extraction. The fact that Infix also ignores password protection on PDFs is also helpful sometimes.
The Article Tool of the Iceni Infix PDF Editor enables various text blocks on different pages of a PDF file to be marked, linked and extracted in various translatable formats such as RTF or HTML. The quality of the results varies according to the format.
Once "articles" are defined, they are exported via the command in the File menu:
The RTF export has some problems, as this view in Microsoft Word with the format characters made visible reveals:
However, the Simple HTML export opened in Microsoft Word shows no such troubles (and can be saved in RTF, DOCX or other formats):
Use of the article export feature requires a license for the Infix PDF editor, unlike the XML or marked-up text exports for translation. In demo mode, random characters are replaced by an "X" so that one can see how the function works but not receive any unjust enrichment from it. However, this feature has significant value for the work of translators and is well worth an investment, as the results are typically better than using OCR software on a "live" (text-accessible) PDF file.
But wait... there's more!
Version 7 also has an OCR feature:
I tested it briefly on some scanned Portuguese Help Wanted ads that I'll probably use for a corpus linguistics lesson this summer; the results didn't look too awful all considered. This feature is worth a closer look as time permits, though it is unlikely to replace ABBYY FineReader as my tool of choice for "dead" PDFs.