Many people rightly fear and loathe the MultiTerm Convert program from SDL and despite many well-written tutorials for its use, intelligent, competent adult translators have become all too frequent callers on the suicide hotline in Maidenhead, UK.
Thus I've cast my lot with members of an Open Source rescue team dedicated to squeezing a little gain for the victims of all this pain and prescribing appropriate remedies for what ails so many of us by developing the Sodrat Software Suite. The solutions here are quick, but they aren't half as dirty as what some pay good money for.
The script below is deliberately unoptimized. It represents less work than drinking a cup of strong, hot coffee on a cold and clammy autumn morning. Anyone who feels like improving on this thing and making it more robust and useful is encouraged to do so. It was written quickly to cover what I believe is the most common case for this type of data conversion. An 80 or 90% solution is 100% satisfactory in most cases. Copy the script from below, put it in a text file and change the extension to VBS, or get the tool, a readme file and a bit of test data by clicking the icon link above.
To run the conversion, just put your tab delimited text file in the folder with the VBS script and then drag it onto the script's icon. The MultiTerm XML import file will be created in the same folder and use the name of the original file with terms as the basis of its name.
Drag & Drop Script for Converting Tab-delimited
Bilingual Data to MultiTerm XML
ForReading = 1
Set objArgs = WScript.Arguments
inFile = objArgs(0) ' name of the file dropped on the script
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(inFile, ForReading)
' read first line for language field names
strLine = objFile.ReadLine
arrFields = Split(strLine, chr(9))
outText = " "UTF-16" & chr(34) & "?>" & chr(13) & "" & chr(13) "
Do Until objFile.AtEndOfStream
strLine = objFile.ReadLine
if StrLine <> "" then
arrTerms = Split(strLine, vbTab)
outText = outText & "" & chr(13) " & chr(13)
for i = 0 to (UBound(arrTerms) )
outText = outText & chr(9) & "" & chr(13) & chr(9) & chr (9) _ " & chr(13)
& "" & chr(13) 
' write the term
outText = outText & chr(9) & chr (9) & chr (9) & "" & chr(13) & chr(9) & " " & _ 
arrTerms(i) & "
next
outText = outText & "
end if
Loop
outText = outText & "
objFile.Close
outFile = inFile & "-MultiTerm.xml"
' second param is overwrite, third is unicode
Set objFile = objFSO.CreateTextFile(outFile,1,1)
objFile.Write outText
objFile.Close

 
 
Paul F has an interesting and recent post on this topic on his blog: http://multifarious.filkin.com/2012/09/17/glossaries-made-easy/
ReplyDeleteIt's worth a look.
Paul has done a lot of good posts. I think his blog is probably the best source of information on the SDL Trados Studio line you'll find (which should not be a surprise given his job, but it's not a given that someone in that position will be articulate like he is too). It's a good tool in the works, but it's a shame that its use requires Excel. Some of those formats could be handled well enough without it. For that matter, one could probably use the APIs of some of the Open Source tools to read Excel files if really necessary. Of course Studio 2009 users are out in the cold and will have to be satisfied with lesser solutions like the limited script here. Still, he raised a very good point about the desirability of going straight into the termbase if possible, because many people do find MultiTerm imports a bit baffling. In any case, I look forward to when the tool he described is released; it will be very useful.
ReplyDeletePaul's recent series on regex (3 parts) is also very worthwhile, worth reading even if you aren't a Studio user.