"Rogue codes", "junk tags" - whatever you choose to call them, they can be a real nuisance for those attempting to work with Microsoft Word and RTF formats in translation environment tools. Previously, I described the use of Dave Turner's Code Zapper macros to clean up junk in DOC and RTF files. Now, for those who work with Microsoft Word 2007 DOCX format, there's another option with memoQ. In the current version of memoQ (4.2.8), there's an option in the "Add document as..." dialog that enables minor format change tags to be ignored. In a practical sense, this works much like Code Zapper.
The relevant option is marked here with a red box:
If that option is not selected, here's what a DOCX file made from the OCR of a PDF file might look like:
If the option is selected as shown in the screenshot of the dialog above, here's what the result looks like:
That's much easier to deal with. There are so many interesting and useful little things in the latest release of memoQ that I probably won't find them all before the next major upgrade. It's not easy keeping up with the progress of an active, dynamic development group like the one at Kilgray.
If you have a "dirty" DOC file with a lot of unwanted, superfluous tags, an obvious strategy is to save it as DOCX (also possible using the 2007 compatibility pack for MS Office 2003), follow the procedure described above, then after export re-save the DOCX file as DOC again if necessary.
An exploration of language technologies, translation education, practice and politics, ethical market strategies, workflow optimization, resource reviews, controversies, coffee and other topics of possible interest to the language services community and those who associate with it. Service hours: Thursdays, GMT 09:00 to 13:00.
Jun 4, 2010
Jun 3, 2010
Dealing with embedded XML and HTML in an Excel file
One of the occasionally gratifying aspects of translation for an IT geek like me is that IT challenges continue to follow me. Actually, that's one of the things about the current state of the profession that I hate too. (I'm a not-so-closeted Luddite.)
This week's challenge was more of a fun puzzle, because it wasn't my problem, but rather someone else's. An agency owner friend sent me an Excel file that was driving him nuts; his localization engineer, a former star at a Top Ten agency, had pronounced the task of filtering the data in a useful way to be impossible. I love it when engineers say something is impossible; it usually means there is a simple solution at hand if one gives the matter a little real thought.
The file structure looked something like this:
Only the yellow columns were to be translated; some had plain text content (with line beaks in some cases), other yellow columns had XML or HTML content.
Just for fun, I fired off a quick support request to Kilgray along with a copy of my test file, because I thought maybe there was a cascading filter feature I might have overlooked. (There isn't, but the idea was noted as a good one, so maybe we'll see it in the future.) In any case, Denis Hay offered a creative suggestion as he almost inevitably does:
There's another way I discovered by the time Denis' suggestion arrived. It works well manually, but it can also be automated with macros if you're dealing with content management system exports where the structure recurs and you'll be doing a lot of this.
Do the following:
This week's challenge was more of a fun puzzle, because it wasn't my problem, but rather someone else's. An agency owner friend sent me an Excel file that was driving him nuts; his localization engineer, a former star at a Top Ten agency, had pronounced the task of filtering the data in a useful way to be impossible. I love it when engineers say something is impossible; it usually means there is a simple solution at hand if one gives the matter a little real thought.
The file structure looked something like this:
Only the yellow columns were to be translated; some had plain text content (with line beaks in some cases), other yellow columns had XML or HTML content.
Just for fun, I fired off a quick support request to Kilgray along with a copy of my test file, because I thought maybe there was a cascading filter feature I might have overlooked. (There isn't, but the idea was noted as a good one, so maybe we'll see it in the future.) In any case, Denis Hay offered a creative suggestion as he almost inevitably does:
Hi Kevin,
While waiting for "cascading filters" (which I also find a great idea), what you could do is simply copy these Excel columns to a Word table, than use either Tortoise Tagger, or preferably the +Tools from the Wordfast website to tag the HTML/XML content. Import that tagged word file into memoQ, and you should get what you wanted.
Once translated, just paste back to Excel.
Kind regards,
Denis HayTechnical consulting and training
Kilgray Translation Technologies
There's another way I discovered by the time Denis' suggestion arrived. It works well manually, but it can also be automated with macros if you're dealing with content management system exports where the structure recurs and you'll be doing a lot of this.
Do the following:
- Copy each individual Excel column of interest (or at least the ones with XML/HTML) into a plain text file.
- In the case of the text files with tagged content (i.e. XML or HTML), change the file extension to fit the content (i.e "text2.txt" becomes "test2.xml", etc.).
- Translate the text files with your favorite translation environment tool, using the filters appropriate for each type of content.
- After exporting the files from your working environment, copy and paste the text file content back into the corresponding columns of the original Excel file. Note taht if there are line breaks somewhere, your row positions may get screwed up. This can be solved by performing this operation in OpenOffice Calc. (Maybe there's an appropriate setting for Excel to avoid this problem, but I don't know it.)
May 31, 2010
The Oberhaveler Stammtisch returns!
This is a somewhat belated announcement, but hey - life is busy. Last year a group of local translators met regularly at a local Italian restaurant to enjoy a relaxing evening of wine and gossip until vacation schedules and the closure of the locale put things on hold. I wanted to revive things for many months, but I simply lacked the time for organization. Then one happy day, colleague Andreas Linke (NL/EN > DE) called to say hello and ask if I might be interested in a local gathering. Andreas has organized a translators' coffee chat in Berlin Kreuzberg for years (which in fact Jost Zetsche had recommended highly, for an extensive list of such meetings in Germany, see http://www.aticom.de/Stammtisch-Termine.pdf), but I never found the time to go into town and meet him. He kindly offered to take over the burden of organizing the local meetings, so I passed on the e-mail addresses of our little group.
Gatherings are planned for the third Thursday of each month at 7:00 pm. Our first get-together was in April at a local coffee house in Birkenwerder. Here are a few photos from the evening:
That's possibly the worst picture of me in the past decade. I didn't have that much to drink, really. Although the atmosphere and food in the restaurant were well suited to such a meeting, the attitude of the service personnel at Kaffeehaus Birkenwerder made it clear why some of the reviews on Qype were not very favorable. After my partner tried to meet us shortly before 10 pm and was turned away at the door, I resolved never to set foot in that joint again.
The following month (May) we switched to a beautiful lakeside restaurant about 10 minutes walk from the city rail station:
A bit more expensive, but you can't beat the view and the service is great. We had a good time; wine and beer flowed freely and everything eventually went to the dogs:
So it looks like our new meeting location every third Thursday at 7 pm will be
Gatherings are planned for the third Thursday of each month at 7:00 pm. Our first get-together was in April at a local coffee house in Birkenwerder. Here are a few photos from the evening:
That's possibly the worst picture of me in the past decade. I didn't have that much to drink, really. Although the atmosphere and food in the restaurant were well suited to such a meeting, the attitude of the service personnel at Kaffeehaus Birkenwerder made it clear why some of the reviews on Qype were not very favorable. After my partner tried to meet us shortly before 10 pm and was turned away at the door, I resolved never to set foot in that joint again.
The following month (May) we switched to a beautiful lakeside restaurant about 10 minutes walk from the city rail station:
A bit more expensive, but you can't beat the view and the service is great. We had a good time; wine and beer flowed freely and everything eventually went to the dogs:
So it looks like our new meeting location every third Thursday at 7 pm will be
Gasthaus am BoddenseeAnyone who would like to get added to the notification mailing list should contact Andreas Linke (linke.andreas [at] gmail.com).
Brieseallee 20
16547 Birkenwerder
(http//www.boddensee.com)
Subscribe to:
Posts (Atom)