Jun 12, 2012

memoQuickie: using the memoQ PDF filter

The memoQ PDF filter is limited: it only works with PDF from editable text, only extracts plain text, and may have problems with complex layouts (such as multiple columns). A PDF with complex or scanned content requires tools such as OCR software (OmniPage, ABBYY FineReader, etc.) to create a source file in RTF, DOC or other formats.

To translate a PDF with the memoQ filter, add it in the Project Wizard or via Project home > Translations > Add document. There are no configurable options.

 
Even simple files may have format problems.



Examine the extracted text carefully, compare to the original and ensure that segmentation and word order are correct. If not, editing may be required after translation. Note how the ingredients list in Segment 3 is run together.

The target file exported after translation will be plain text. If formatting is needed, it must be applied with other software.

2 comments:

  1. The "compare to the original" part would be easier if we don't use MemoQ to do the comparison.

    Just copy source to target, export the text file, then open it in MS Word of Open/Libre Office.

    Enable the hidden text view so the paragraph marks ¶ can be clearly seen and edit from there.

    ReplyDelete
  2. Good point. It might actually make sense in the workflow to export the text as you suggest, apply formatting in a word processor and save as RTF, DOC, DOCX, ODT or whatever. Then you would have the preview as well.

    Despite the problems the PDF text extractor Kilgray uses can have with complex layouts, I actually rather like the fact that it gives you naked text as opposed to the formatted conversions with a surplus of embedded spaces like one gets with SDL Trados. These formatted conversions are also often a royal nuisance if the text is to be laid out in another format later. Better a clean text flow to start, and one without a lot of tag trash. That pure text extract means no tags at all.

    On a few occasions, I have used memoQ's PDF filter to get clean extracts to dump into Antconc or other tools to get a better look at the collocations. I found this was more convenient and less prone to errors than using ABBYY FineReader.

    I do wish, however, that Kilgray would steal one idea from the competition. It's rather silly to have to copy the source to the target just to recover the source text. I give my friends at SDL big points for allowing source texts to be exported directly without such nonsense.

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)