Oct 22, 2013

Complex dictionaries in memoQ LiveDocs

One never knows when a good idea might come up. This one isn't particularly original, in fact it's probably bleedin' obvious to the memoQ LiveDocs cognoscenti. I think it's drifted in and out of my mind a few times, but I never gave it much heed until a friend contacted me shortly before midnight with a slightly urgent question about what to do with an "XML terminology" that a client had sent. It turned out to be an SDL MultiTerm XML export but without a definition file. She wanted the data conveniently available in memoQ. Oh, crap, I thought. This could be a long night. Kilgray has added such capabilities to its qTerm server application, but the Fußvolk who use memoQ desktop versions don't have that option right now. And I shelved my XSLT efforts for this some time ago because nobody seemed seriously interested.

But then she said something about a "Word file". It turned out that the client had made one of those nice RTF dictionary exports that MultiTerm can produce and which was also the target of my XSLT work a year ago. This was exactly what I planned to make for her if the XML proved to be loaded with synonyms and term metadata. It was.

And then... I thought... why not just throw this in LiveDocs as a "monolingual" document? And thus a nice way to make complex glossary data available without importing it into a termbase was (re)born. Of course stuff like this has been going on for ages with Archivarius and other search tools. But not so much in an integrated way with CAT tools. Here's a quick visual tour of the process and the end result:


Here's the RTF file and a peek at the financial term data it contains. Not a chance I can parse that beast for a termbase! So I picked a LiveDocs corpus and clicked Import document and chose the RTF file:


I lied and said it was "German". Well, that's partly true and in this case, the end justifies the means.


A few minutes later, this dictionary was available in an ordinary concordance search, its entire content indexed as "source" text. To see the context, I right-click on the concordance hit to open the document saved in the LiveDocs corpus.


And here it is. I can do further searches within the document using Ctrl+F (Find). The English definition can be copied from here if I feel like doing that.

Now I know what I should do with that huge trilingual fire safety dictionary that's been kicking around my reference folders for the past 10 years... once again, LiveDocs made my day.

3 comments:

  1. LiveDocs is awesome, no question about it!

    ReplyDelete
  2. Hi Kevin,

    actually, after a brief peek at the screenshot of the source RTF, it looks like perfectly parseable to a termbase. It looks that you can define "source term" using a font size and "target term", well, that's more complicated, but searching for paragraphs like "EN Status OR Quelle OR Anmerkung OR Bewertung " and replacing this with simply "" would cover all examples that are available.

    Should that not work my second try would be to remove everything that's italic, remove extra paragraphs and move every second line to another column.

    Best,
    JH
    Of course, there may be hundred dragons in what I don't see and you do...

    ReplyDelete
    Replies
    1. I admire your courage, Jarek :-) If you can find a way to parse those RTFs cleanly despite synonym structures, etc. and have them ready to import without a hitch in a memoQ CSV, I will personally set up a shrine at which many frustrated users will surely worship. Actually, I have a better idea... if you're an XSLT freak who is good with FO... I'd like an RTF or a DOCX with something close to this formatting from the memoQ XML output. Or a roundtrip script to go from the memoQ MultiTerm XML to the fully specified memoQ CSV import. There are a few mixed teams out there suffering without that, and the best I've managed so far are those pretty HTML tables I showed off last year.

      Delete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)