Sep 16, 2012

The Sodrat Suite: delimited text to MultiTerm

The growing library of tools in the Sodrat Suite for Translation Productivity now includes a handy drag & drop script sample for converting simple tab-delimited terminology lists into data which can be imported directly into the generations of (SDL) Trados MultiTerm with which we've been blessed for more than half a decade.

Many people rightly fear and loathe the MultiTerm Convert program from SDL and despite many well-written tutorials for its use, intelligent, competent adult translators have become all too frequent callers on the suicide hotline in Maidenhead, UK.

Thus I've cast my lot with members of an Open Source rescue team dedicated to squeezing a little gain for the victims of all this pain and prescribing appropriate remedies for what ails so many of us by developing the Sodrat Software Suite. The solutions here are quick, but they aren't half as dirty as what some pay good money for.

The script below is deliberately unoptimized. It represents less work than drinking a cup of strong, hot coffee on a cold and clammy autumn morning. Anyone who feels like improving on this thing and making it more robust and useful is encouraged to do so. It was written quickly to cover what I believe is the most common case for this type of data conversion. An 80 or 90% solution is 100% satisfactory in most cases. Copy the script from below, put it in a text file and change the extension to VBS, or get the tool, a readme file and a bit of test data by clicking the icon link above.

To run the conversion, just put your tab delimited text file in the folder with the VBS script and then drag it onto the script's icon. The MultiTerm XML import file will be created in the same folder and use the name of the original file with terms as the basis of its name.

Drag & Drop Script for Converting Tab-delimited
Bilingual Data to MultiTerm XML

ForReading = 1
Set objArgs = WScript.Arguments
inFile = objArgs(0) ' name of the file dropped on the script

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(inFile, ForReading)

' read first line for language field names
strLine = objFile.ReadLine
arrFields = Split(strLine, chr(9))

outText = "          "UTF-16" & chr(34) & "?>" & chr(13) & "" & chr(13)   
   
Do Until objFile.AtEndOfStream
 strLine = objFile.ReadLine
 if StrLine <> "" then
  arrTerms = Split(strLine, vbTab)
   
  outText = outText & "" & chr(13)
      for i = 0 to (UBound(arrTerms) )
        outText = outText & chr(9) & "" & chr(13) & chr(9) & chr (9) _
                   & "" & chr(13)
        ' write the term
        outText = outText & chr(9) & chr (9) & chr (9) & "" & _
               arrTerms(i) & "
" & chr(13) & chr(9) & "
" & chr(13)
      next
  outText = outText & "
" & chr(13)
 end if
Loop

outText = outText & "
"
objFile.Close
outFile = inFile & "-MultiTerm.xml"

' second param is overwrite, third is unicode
Set objFile = objFSO.CreateTextFile(outFile,1,1)
objFile.Write outText
objFile.Close


2 comments:

  1. Paul F has an interesting and recent post on this topic on his blog: http://multifarious.filkin.com/2012/09/17/glossaries-made-easy/
    It's worth a look.

    ReplyDelete
  2. Paul has done a lot of good posts. I think his blog is probably the best source of information on the SDL Trados Studio line you'll find (which should not be a surprise given his job, but it's not a given that someone in that position will be articulate like he is too). It's a good tool in the works, but it's a shame that its use requires Excel. Some of those formats could be handled well enough without it. For that matter, one could probably use the APIs of some of the Open Source tools to read Excel files if really necessary. Of course Studio 2009 users are out in the cold and will have to be satisfied with lesser solutions like the limited script here. Still, he raised a very good point about the desirability of going straight into the termbase if possible, because many people do find MultiTerm imports a bit baffling. In any case, I look forward to when the tool he described is released; it will be very useful.

    Paul's recent series on regex (3 parts) is also very worthwhile, worth reading even if you aren't a Studio user.

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)