tag:blogger.com,1999:blog-20155610.post8709770421834576835..comments2024-03-06T02:46:19.929+00:00Comments on Translation Tribulations: Twitterview: SDL Trados Studio, memoQ, DVX2 and PDF extractionKevin Lossnerhttp://www.blogger.com/profile/14727800526216764023noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-20155610.post-7635290059244552902012-06-05T22:05:56.631+01:002012-06-05T22:05:56.631+01:00Recently, I OCRed a PDF file Abby Reader 11, with ...Recently, I OCRed a PDF file Abby Reader 11, with very complex layout. I had to edit the resulting Word file very heavily to get something useful. Then I tried to import the resulting Word file into a memoQ project, but memoQ would crash each time. <br /><br />I thought that this was an opportunity to test DVX2's PDF import. I imported the PDF file into a DVX2 project, copied source to target and exported the result. I got an almost perfect Word file without any manual work. There were far less tags than I expected, very few in fact.<br />I was amazed at the quality of the results.Maxime Boissethttp://catology.boisset.eunoreply@blogger.comtag:blogger.com,1999:blog-20155610.post-89498367414385242202012-04-28T13:07:43.440+01:002012-04-28T13:07:43.440+01:00Wanting too much, one ends up having almost nothin...Wanting too much, one ends up having almost nothing<br />I confess, I do not believe cat tools should deepen their capability in this respect. And yes, translator's requirements are absolutely unclear to anybody else... Nice layout is their goal and if it is achieved by textboxes, they do not care..<br />I prefer using specialised tools - pdf transformer by nuance, fine reader by abbyy, and - believe me - plustools from the wordfast tools...<br />The latter gives the best results for translation (not layout preserving) purposes<br /><br />Stefan Pecen, simulta, Bratislava, SlovakiaStefan Pecenhttp://www.simulta.sknoreply@blogger.comtag:blogger.com,1999:blog-20155610.post-13103633817369168062012-04-26T21:44:32.372+01:002012-04-26T21:44:32.372+01:00Interesting article Kevin. I would completely agre...Interesting article Kevin. I would completely agree with you about how best to handle PDFs, but also see a place for a CAT tool to have a method for handling these files as well as they can.<br />Interestingly I downloaded your test files and gave them to our filter developer. The original PDF seems to contain very wide spaces (probably some special formatting settings). As the PDF converters are designed to replicate the design, multiple spaces are inserted to cope with this.<br />Not all files are like these so it's probably not a consistent problem for Studio... but using a dedicated OCR tool like ABBYY FineReader would clearly do a better job with any PDF.Anonymousnoreply@blogger.com