Translation Tribulations: search

Showing posts with label search. Show all posts

Nov 4, 2019

Finding a file in some random memoQ project

One of the nice things about new users of any software is that they approach it without the ingrained habits of routine users of that software and often ask useful questions that the rest of us might not have considered. One such question was posed today by Aloísio Ferreira, a translation student at FCSH/NOVA in Lisbon. He thought it would be useful if there were a function to help translators or project managers locate a file in some memoQ project no longer remembered. I can see the point of this; on a number of occasions in the past, I have clicked around through various projects trying to do just such a thing, and I had not considered any better way of achieving my objective.

As many of you already know, the Windows Explorer is able in some cases to index and search the content of files, and I knew that the majorVersionStore.info files in the project subfolders for memoQ included the names of files. So I went through the steps needed to ensure that the file contents would be indexed as plain text. All quite unnecessary it turned out.

Feeling very sophisticated after updating which folders were to be indexed, I tested the idea in the folder window for my memoQ projects, which contains all the subfolders with the name of each project. As you can see from the results in the screenshot above, the projects also contain placeholder files (0 bytes in size) with the names of the files imported to translate.

So the short answer to Aloísio's question is that no new feature programming is needed in memoQ; simply go to your projects folder and do a search with part of the filename (use quotes if there are spaces in the name, as in the example above), and the path for the files in the results will show you which projects have what you ate looking for.

From there you can use part of the project name in the filter field of the memoQ Dashboard to find the project you need, open it and work with the file in some way.

And of course once you have opened the project, if there are a lot of files in the list of Project home > Translations, there is another filter you can use to zero in on the one(s) you want quickly:

This screenshot is from a project with two target languages, created by the useful PM Edition of memoQ

What good is all this? It depends. I usually go on a hunt like this if I am given a new version of some file I translated years ago, and I can't remember where it is to use the X-translate feature so the pretranslation will use and lock any unchanged blocks of text from the old version. This can also be used (indirectly) to figure out which heavy resources (attached to the old project) may be useful for other work. I'm sure you can come up with half a dozen reasons of your own if you think about it.

Oct 29, 2019

Bilingual EU legislation the easy way in #xl8

Translators of European languages based in the EU and many others deal often with citations of EU legislation or need to consult relevant EU legislation for terminology in their translations. One popular source of information for that is the EUR-LEX website, which provides a convenient archive of legislation and related information, with the possibility of multilingual text displays, as seen here:

Some years ago, I published a description of how data from these multilingual EUR-LEX displays can be transferred to translation memories or other corpora for reference purposes, and more recently I produced a video showing this same procedure. But some people don't like the paragraph-level alignment format of the EUR-LEX displays, and these can also occasionally be seriously out of sync for some reason, as in this example (or worse):

Now I don't find that much of a nuisance when I use memoQ LiveDocs, because I can simply view the full bilingual document context and see where the corresponding information really is (kind of like leaving alignments in memoQ uncorrected until you actually find a use for the data and determine that the effort is worthwhile), but if you plan to feed that aligned data to a translation memory, it's a bit of a disaster. And many people prefer data aligned at the sentence level anyway.

Well, there is a simple way to get the EU legislation texts you want, aligned at the sentence level, with the individual bitexts ready to import into a translation memory, LiveDocs corpus or other reference tool. See that document number above with the large red arrow pointing to it? That's where you start....

Did you know that much of the information available in EUR-LEX is also available in the publicly available DGT translation memories? These are sentence-level alignments. But most people go about using this data in a rather klutzy and unhelpful way. The "big data" craze some years ago had a lot of people trying to load this information into translation memories and other places, usually with miserable results. These include:

the inability to load such enormous data quantities in a CAT tool's TM without having far more computer RAM than most translators ever think they'll need;
very slow imports, some apparently proceeding on a geological time scale;
data overload - so many concordance hits that users simply can't find the focused information they need; and
system performance degradation, with extremely sluggish responses in a wide variety of tasks.

Bulk data is for monkeys and those who haven't evolved professionally much beyond that stage. Precision data selection makes more sense, and enables better use of the resources available. But how can you achieve that precision? If I want the full bilingual text of EU Regulation No. 575/2013 in some language pair, for example, with sentence-level alignment, how can I find that quickly in the vast swamp of DGT data?

Years ago, I published an article describing how it is better to load the individual TMX files found in the downloadable ZIP archives from the DGT into LiveDocs so that the full document context can be seen from the concordance searches. What I didn't mention in that article is that the names of those individual TMX files correspond to the document numbers in EUR-LEX.

Armed with that knowledge, you can be very selective in what data and how much you load from the DGT collection. For example, if you organize the data releases in folders by year...

... and simply unpack the ZIP files in each year's folder...

... each folder will contain TMX files...

... the names of which correspond to the document number found in EUR-LEX. So a quick search in Windows Explorer or by other means can locate the exact document you want as a TMX file ready to import into your CAT tool:

These TMX files typically contain 24 EU languages now, but most CAT tools will filter just the language pair you want. So the same file can usually give you Polish+French, German+English, Portuguese+Greek or whatever combination you need among the languages present.

I still prefer to import my TMX data into a LiveDocs corpus in memoQ, and there I can use the feature to import a folder structure, and in the import dialog, I simply write the name of the file I want, and all other files (thousands of them) are promptly excluded:

After I enter the file name in the Include files field, I click the Update button to refresh the view and confirm that only the file I want has been selected. Depending on where in memoQ you do the import, you may have to specify the languages (Resource Console) to extract or not (in a project, where the languages are already set). Of course, the data can also be imported to a translation memory in memoQ, but that is an inferior option, because then it is not possible to read the reference document in a bilingual view as you can in a LiveDocs corpus; only isolated segments can be viewed in the Concordance or Translation results pane.

How you work with these data and with what tools is up to you, but this procedure will provide you with a number of options for better data selection and improved access to the reference data you may need for EU legislation without getting stuck in the morass of millions of translation units in a performance-killing megabomb TM.

Jul 31, 2019

URL-based searches of your Google Drive

Just before a recent short holiday, I ran across an article from 2017 which described how to search Google Drive directly from Chrome's address bar. "Interesting," I thought, and with the possibility of integrating such Google Drive searches with IntelliWebSearch or memoQ's integrated web search feature (or similar features in other environments) in mind, I shared the link with a few friends.

Google Drive and its application suite, which includes GoogleDocs (the word processor) and Google Sheets (the spreadsheet application), offer many possibilities for helping in language projects, collaborative and otherwise. I have written extensively about these possibilities with terminology (here, for example, and in a number of related articles). But these earlier investigations involved specific documents and viewing these - or selected portions of them - in a web browser window. Searching a number of files of various types on one's Google Drive ("My Drive") or a subfolder thereof is a little different. Possibly more useful in some circumstances, such as in a group project where multiple participants are contributing to a shared reference folder (though this folder will have to be added to the "My Drive" of each collaborator).

Google's Help for the relevant search function explains:

You can find files in Google Drive, Docs, Sheets, and Slides by searching for:

File title

File contents

Items featured in pictures, PDF files, or other files stored on your Drive

You can only search for files stored in My Drive. Files stored in folders shared with you won't appear in your search unless you add the folders to My Drive.

You can also sort and filter search results.

It all starts with a basic URL, such as

https://drive.google.com/drive/search?q=SOMETEXT

Execute that in your browser's address bar, replacing the SOMETEXT with your desired search expression, and you'll get a hit list of all files on your Google drive which include that text in the title or contents. In a tool like memoQ Web Search, it is substituted by the placeholder for search text that the application uses (that is {} in the case of memoQ Web Search). With a little experimentation, you'll soon find the additional arguments to search specific file types or folders.

For example, if I want to do a search in the "Other" subfolder on my Google drive, I can discover the URL arguments by starting a manual search and just reading the address bar:

The parameter to use for a specific folder search is "parent", followed by a colon and the coded ID of that folder.

An example of a folder search with a specific text segment is in the screenshot above; this was taken while configuring and testing the search in a memoQ Web Search profile. One document containing the search text "turnip" was found in the folder. To view the document, right-click on it in the hit list and choose Preview.

Search inside the preview of a document found in a Google Drive search with memoQ Web Search

Unfortunately there seems to be a bug in the memoQ Web Search - which now uses Chromium - because double-clicking the document tries to open it in the old search engine based on Internet Explorer, where I was not logged in to Google.

An Internet Explorer window, bizarrely launched by the Chromium-based memoQ Web Search

In fact, you'll have to log in to Google each time you open the memoQ Web Search window (a total nuisance), so it's better to leave it open in the background, even though the current bug in which the web search window is no longer brought to the forefront can make this inconvenient. In other tools this may not be an issue.

The Chromium/IE issue as well as the focus and login hassles with memoQ's web search have been reported to memoQ Support; I look forward to seeing how these are handled. Nonetheless, this Google Drive search seems to have significant potential for individuals and teams to build searchable document collections in the folders of a Google Drive account. Try it in your working environment and share your findings!

Jun 4, 2019

Regular expressions in memoQ demystified - THE workshop!

Next week in Utrecht there will be a unique workshop to enhance your productivity with memoQ, as you learn how to develop rules for automated formatting and QA of patterned expressions, such as dates, currency expressions, unusual or custom text formats and more. THIS knowledge is one of those "secret weapons" that I deploy to help the most sophisticated financial and legal translators I know save countless hours of mind-numbing donkey work doing QA on things like legal references and expressions involving currency (such as EUR 3 million vs. €3m, etc.) or creating those references in the first place and inserting them in the translation with a simple keystroke.

The course instructor, Marek Pawelec, is one of my personal resources when I am in over my head on technical problems or when I need to be very sure that a client of mine gets the right help in time. He has a rare gift of taking subject matter which many find baffling and presenting in a way that makes it accessible to most any educated adult.

Because of the scope of this subject matter and the importance of proper follow-up and support while learning it, the workshop will be held over two days - June 10 and 11 (Monday and Tuesday) - from 10 am to 4 pm each day, which will give plenty of time to learn the basics and move on to apply your new technical skills to common and not-so-common technical challenges in translation projects where memoQ is involved.

Trust me on this one: we are talking about critical process secrets to save massive amounts of time and do better work on things like annual reports, court briefs and more. Or creating projects for text formats that seem impossible to work with at first glance. THIS is where the money is in an increasingly competitive market.

Information to register now can be found on the Facebook event page for the workshop or on the relevant Regex Workshop page for the host, the All Round Translator education cooperative in the Netherlands.

Nov 26, 2017

MS Word Macros to Speed up Translation-Related Terminology Research

Guest post by Tanya Harvey Ciampi, English translator (DE/FR/IT>EN)

Is your terminology research slowing you down?

When we translate Microsoft Word documents, we often find ourselves having to leave Word to look up terms online, for example in monolingual dictionaries for definitions, in bilingual dictionaries or translation memory databases for translations, on specific reputable websites (such as newspaper websites) to double-check usage or frequency of use, or on clients’ own multilingual websites to check how certain terms have been translated in the past to ensure consistent use of terminology.

This sort of research involves switching to a browser, copying and pasting or retyping our term into a search box, possibly adding specific search criteria, and finally launching a search: all that typing and clicking can be time-consuming and easily cause us to become lost among the many windows opened.

Macros to the rescue!
This is where macros come in. A macro is essentially a short sequence of commands that automates repetitive tasks. Macros cost nothing to create and can be tweaked to do exactly what you need them to do, based on your specific language combinations and favourite online terminology resources, providing these lend themselves to this sort of querying.

How do macros work?
A macros consists of code, which you simply need to copy and paste into the Macros section of Word. That done, you then need to assign an icon to the macro and add it to your toolbar to launch the macro with a single click every time you need it. If you wish, you may also assign a specific key combination to the macro (for example CTRL plus a key of your choice) so that you can launch the macro from your keyboard, too.

From now on, when translating a text in Word, all you need to do is place your cursor on a word that you wish to look up and click on the corresponding icon in your toolbar (or use the assigned key combination) to launch the search. That’s all there is to it!

A few examples of macros and what they can do for you:

SCENARIO: Imagine...	SOLUTION... with a single click!
...you need to look up a term in the bilingual dictionaries www.leo.org and www.dict.cc but this requires opening your browser, browsing to both dictionaries separately and pasting in or retyping your search term on each website... quite time-consuming!	A macro to search both dictionaries at once taking your word from MS Word and inserting it automatically in both dictionaries for you... with a single click from within Word. (This macro can be adapted to all sorts and any number of websites) What this macro does essentially is launch a Google search from within Word, adding specific search criteria, in this case: “your search term” inurl:leo.org or inurl:dict.cc
...you wish to run a search in the online translation memory database www.linguee.com (or linguee.de, linguee.fr, linguee.it etc.) to check how other translators have translated a certain term or expression.	A macro to search Linguee taking your word from MS Word and inserting it directly in the Linguee search engine with a single click from within Word. This macro produces a list of source- and target-language sentences containing your search term along with context.
...you are translating a text and need to check how a particular expression is used. You decide to search reputable sources such as high-quality newspapers to check usage and/or frequency of use of a specific term or expression. Where do you look?	A macro to search specific newspaper websites which you consider reputable sources from within Word. (This macro can be adapted to all sorts and any number of websites.) This macro essentially launches a Google search from within Word, adding specific search criteria to target a specific website, for example: “your search term” inurl:guardian.co.uk
...you are translating for a company that has a multilingual website and you need to check how a specific term has been translated in the past.	A macro to search for the term on a specific multilingual website from within Word. This macro can be extended to cover various related multilingual websites. In banking, for example, these might include the following: www.ubs.com www.credit-suisse.com www.raiffeisen.ch This macro essentially launches a Google search from within Word, adding specific search criteria, for example: “your search term” site:www.ubs.com or site:www.credit-suisse.com or site:www.raiffeisen.ch
...you are translating a text and can't find an appropriate translation of an expression or technical term in any dictionary.	A macro to search for your term on a large multilingual website such as that of the European Union from within Word. This macro targets the section of the EU website containing translations side by side (“parallel texts”) on the same page, saving you precious time. This macro essentially launches a Google search from within Word, adding specific search criteria, for example: “your search term” inurl:eur-lex.europa.eu Once you have opened a page on the EU website, all you need to do is specify your target language under “Multilingual display” to view source and target language side by side.

See a couple of these macros in action:
https://www.youtube.com/watch?v=XlvBLgJPaFk

These and more macros are available for free at https://www.facebook.com/groups/TranslatorsSwitzerland/

The macros themselves are written by a translator with translators' needs in mind and can be adapted to your specific requirements.

Macros may also be created to automate the web-based terminology research techniques for translators found at
http://www.multilingual.ch/Search_Interfaces.htm
... reducing them, too, to a single click in Word!

The original search techniques on which these macros are based were featured in the book entitled “Google Hacks” (“Hack #19: Google Interface for Translators”) by Tara Calishain, Rael Dornfest

*******

Tanya Harvey Ciampi, Dipl. DOZ (Zurich)
English translator (DE/FR/IT>EN)
6673 Maggia, Switzerland, www.multilingual.ch

Tanya grew up in Buckinghamshire, England, and went on to study in Zurich, where she obtained her diploma in translation. She now lives in the Ticino, the Italian-speaking region of Switzerland, where she works as an English translator (from Italian, German and French) and proofreader.

Jul 26, 2017

Shortcuts to managing bitext corpora and terminologies in free Google Sheets

When I presented various options for using spreadsheets available in the free Google Office tools suite on one's Google Drive, I was asked if there wasn't a "simpler" way to do all this.

What's simple? The answer to that depends a lot on the individual. Yes, great simplicity is possible with using the application programming interface for parameterized URL searches described in my earlier articles on this topic:

The answer is yes. However, there will be some restrictions to accept regarding your data formats and what you can do with them. If that is acceptable, keep reading and you'll find some useful "cookie cutter" options.

When I wrote the aforementioned articles, I assumed that readers unable to cope with creating their own queries would simply ask a nerdy friend for five minutes of help. But another option would be to used canned queries which match defined structures of the spreadsheet.

Let's consider the simplest cases. For anything more complicated, post questions in the comments. One can build very complex queries for a very complex glossary spreadsheet, but if that's where your at, this and other guns are for hire, no checks accepted.

You have bilingual data in Language A and Language B. These can be any two languages, even the same "language" with some twist (like a glossary of a modern standard English with 19th century thieves' cant from London). The data can be a glossary of terms, a translation memory or other bitext corpus, or even a monolingual lexicon (of special terms and their definitions or other relevant information. The fundamental requirement is that these data are placed in an online spreadsheet, which can be created online or uploaded from your local computer and that Language A be found in Column A of the spreadsheet and Language B (or the definition in a monolingual lexicon) in Column B of the spreadsheet. And to make things a little more interesting we'll designate Column C as the place for additional information.

Now let's make a list of basic queries:

Search for the text you want in Column A, return matches for A as well as information in Column B and possibly C too in a table in that order
Search for the text you want in Column B, return matches for B as well as information in Column A and possibly C too in a table in that order
Search for the text you want in Column A or Column B, return matches for A/B and possibly C too in a table in that order

Query 1: searching in Column A

The basic query could be: SELECT A, B WHERE A CONTAINS '<some text>'

Of course <some text> is substituted by the actual text to look for enclosed in the single straight quote marks. If you are configuring a web search program like IntelliWebSearch or the memoQ Web Search tool or equivalents in SDL Trados Studio, OmegaT or other tools, the placeholder goes here.

If you want the information in the supplemental (Comment) Column C, add it to the SELECT statement: SELECT A, B, C WHERE A CONTAINS '<some text>'

The results table is returned in the order than the columns are named in the SELECT statement; to change the display order, change the sequence of the column labels A, B and C in the SELECT, for example: SELECT B, A, C WHERE A CONTAINS '<some text>'

Query 2: searching in Column B

Yes, you guessed it: just change the column named after WHERE. So

SELECT B, A, C WHERE B CONTAINS '<some text>'

for example.

Query 3: searching in Column A or Column B (bidirectional search)

For this, each comparison after the WHERE should be grouped in parentheses:

SELECT A, B, C WHERE (A CONTAINS '<some text>') OR (B CONTAINS '<some text>')

The statement above will return results where the expression is found in either Column A or Column B. Other logic is possible: substituting AND for the logical OR in the WHERE clause returns a results table in which the expression must be present in both columns of a given record.

And yes, in memoQ Web Search or a similar tool you would use the placeholder for the expression twice. Really.

Putting it all together

To make the search URL for your Google spreadsheet three parts are needed:

The base URL of the spreadsheet (look in your browser's address bar; in the address https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/edit#gid=1106428424 for example, the base URL is everything before /edit#gid=1106428424.
The string /gviz/tq?tqx=out:html&tq= and
Your query statement created as described above

Just concatenate all three elements:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

An example of this in a memoQ Web Search configuration might be:

https://docs.google.com/spreadsheets/d/1Bm_ssaeF2zkUJR-mG1SaaodNSatGdvYernsE7IJcEDA/gviz/tq?tqx=out:html&tq=SELECT B, A WHERE (A CONTAINS '{}') OR (B CONTAINS '{}')

and here you can see a search with that configuration and the characters 'muni' : https://goo.gl/D5cQmh

Adding custom labels to the results table

If you clicked the short URL given as an example above, you'll notice that the columns are unlabeled. Try this short URL to see the same search with labels: https://goo.gl/3zJQqK

This is accomplished simply by adding LABEL A 'Portuguese', B 'English' to the end of the query string.

If you look at the URL in the address bar for any of the live web examples you'll notice that space characters, quote marks and other stuff are substituted by codes. No matter. You can type in clear text and use what you type; modern browsers can deal with stuff that is ungeeked too.

To do more formatting tricks, RTFM! It's here.

Jun 6, 2017

Build your own online reference TM for a team or anyone!

In the past, I have published several articles describing the use of free Google Sheets as a means of providing searchable glossaries on the Internet. This concept has continued to evolve, with current efforts focused on the use of forms and Google's spreadsheet service API to provide even more free, useful functionality.

On a number of occasions I have also mentioned that the same approaches can be used for translation memories to be shared with people having different translation environments, including those working with no CAT tools at all. However, the path to get there with a TM might not be obvious to everyone, and the effort of finding good tools to handle the necessary data conversions can be frustrating.

I've put up a demonstration TM in Portuguese and English here: https://goo.gl/LXXgmf

Here is a selection from the same data collection, selecting for matches of the Portuguese word 'cachorro': https://goo.gl/9KJils
This uses the same parameterized URL search technique described in my article on searchable glossaries.

A translation memory in a Google Sheet has a few advantages:

It can be made accessible to anyone or to a selected group (using Google's permission scheme)
It can be downloaded in many formats for adding to a TM or other reference source on a local computer
Hits can also be read in context if the TM content is in the order it occurs in the translated documents. This is an advantage currently offered in commercial translation environment tools only by memoQ LiveDocs corpora.

Web search tools of many kinds can be configured easily to find data in these online Google Sheet "translation memories" - SDL Trados Studio, OmegaT and memoQ are among those tools with such facilities integrated, and IntelliWebSearch can bridge the gap for any environment that lacks such a thing.

But... how do you go from a translation memory in a CAT tool to the same content in a Google Sheet? This can be confusing, because many tools do not offer an option to export a TM to a spreadsheet or delimited text file. Some suggestions are found in an old PrAdZ thread, but I found a more satisfactory way of dealing with the problem.

A few years ago, the Heartsome Translation Studio went free and Open Source. It contains some excellent conversion tools. I downloaded a copy of the Heartsome TMX Editor (the available installers for Windows, Mac and Linux are here) and used it to convert my TMX file.

The result was then uploaded to a public directory on my personal Google Drive, and the URL was noted for building queries. Fairly straightforward.

The Heartsome TMX Editor seems like it might be a useful tool to replace Olifant as my TMX editor. While the TM editor in my tool of choice (memoQ) has improved in recent years, it still does not do many things I require, and some of this functionality is available in Heartsome.

May 23, 2017

IntelliWebSearch: really the best Windows-based search tool for translators.

When I began using Michael Farrell's IntelliWebSearch (IWS) about a year ago, shortly before a few IAPTI webinars on that subject, I was impressed with the tool's flexibility, but one thing drove me nuts: the browser kept adding tabs with each search, unlike the tool I favored at the time, memoQ Web Search. But the latter is restricted to use within memoQ, so I had some hope of sorting out the problem with IWS.

I asked the program's author for a solution, but I think I failed to articulate the problem properly: I was told that this was simply a shortcoming I would have to live with. Not true. Michael's tool is better than he said.

The solution turned out to be in the program's settings, which are accessed under the Edit menu.

An example of "improved" settings more to my taste is above. The important thing for me to get the behavior I wanted was to define the return behavior. Use the return shortcut and close the browser. Subsequent actions can include pasting any copied text if you like.

Of course, adding extra tabs to the open browser is not such a bad thing in some cases, providing a sort of tab-based "history" of the searches. And simply using the search window shortcut opens an IWS window with text copied to a search field, where individual searches can be launched in the browser of choice using icons for various configured searches.

The much greater flexibility of IntelliWebSearch, its universal application in any Windows software, its memory stability (memoQ Web Search has had a serious memory leak for a long time, resulting in crashes and other troubles) and its very modest price for licenses after a 2-month trial makes it my search tool of choice now that I can get the browser window behaviors I want. And various "profiles" for searching can be saved in external files for backup and sharing with others.

For educational and professional use, this is a superb choice. The program can also be linked to local information, such as CD-based dictionaries or desktop search tools. Check it out!

Dec 27, 2016

Free shareable, searchable glossaries for collaboration with anyone

Some years ago I suggested a procedure using Google spreadsheets for glossary collaboration in projects. Many people do this sort of thing now.

What I do not think most are doing, however, is accessing these web-based term lists efficiently as terminology resources in their work. It's hard to compete with the efficiency of integrated termbases, TMs, web search features, etc.

... unless of course you integrate a web search for those online spreadsheets which returns just the few data of interest.

Matches found for German "ladepresse" in a glossary of a few thousand hunting terms

This is fairly straightforward using Google's visualization API with a simple query. A parameterized URL can be built to perform custom searches of your own data or data shared by colleagues or clients. "Canned" queries can be easily incorporated in custom searches from many tools, including memoQ Web Search, IntelliWebSearch and others.

Building a custom search URL for your Google spreadsheet is fairly simple. In the example above it consists of three parts:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

The red bit invokes the Google visualization API and specifies that the query results be returned as HTML (for display in a browser). The query language is similar to SQL, but if you use a prepared query for a given spreadsheet table structure, you don't need to learn any of that. Queries can be made which also return definitions, images, context examples or anything else that might reside in columns of interest in the online spreadsheet.

Using a tool like IntelliWebSearch or integrated extensions of OmegaT, memoQ and other tools, users working with any sort of tools can share a live glossary. Google Spreadsheets also have some permissions/security features which can be investigated if needed.

Of course other data can be shared this way, including TMs or XLIFF data as well as monolingual information. A little study of the relevant Google documentation reveals many possibilities :-)

Getting the picture with automated web searches

Like many other translators, I have come to appreciate the value and the complications of Internet searches in my work. As the garbage accumulated on the World Wide Web grows ever deeper, focused searches are more important than ever to get past the noise to find the information required, then get back to work.

Integrated tools for focused searches on multiple web sites are popular with many. IntelliWebSearch (IWS), memoQ Web Search and similar tools can be an enormous boost to productivity. But I doubt that many people give much thought to optimizing that possibility in general or for particular jobs.

Google searches are very popular. The Advanced Search features are particularly useful. For example, I find translating Austrian legal texts to be difficult sometimes, because an ordinary Google search of relevant legal terms yields too much interference from sites in other German-speaking countries. However, a search configured like this:

https://www.google.com/search?as_epq=schwerer+Betrug&lr=lang_de&as_sitesearch=www.jusline.at

will yield only results in German from the Austrian site Jusline, which is very helpful if I am looking for the specific definition of "schwerer Betrug" in the jurisprudence of that country.

Similarly, a financial translator working with Austrian texts might use a search like

https://www.google.pt/search?as_epq=Umlage&as_sitesearch=www.afrac.at

In my technical work, very often I must look for images of a component or process described. For a long time I did this inefficient: searched Google and then clicked the Images link and waded through the chaos to find what I needed. But if I am translating the catalog of the hunting supplier Frankonia, that's stupid. I can do a very specific search like this:

https://www.google.com/search?q=wildbergehaken+site:frankonia.de&tbm=isch

which will open a Google Images search directly (that's what the argument tbm=isch does), using only pictures culled from the site of the retailer whose material I am working on.

An image search using Wikipedia.org can often be very helpful to identify an unknown term and navigate to related articles in various languages. For example, a person encountering an unknown word in Russian might use this search:

https://www.google.com/search?as_epq=собака&as_sitesearch=wikipedia.org&tbm=isch

and quickly see what the term is about.

The search results above were obtained with memoQ Web Search, where I have the Wipikedia image search preconfigured:

Astute readers may notice the slight difference in syntax between the search in the screenshot and the Russian example I gave. There is more than one way to skin a cat with web searches. Or a dog in this case. To restrict searches to the wiki for one particular language just add the prefix for that subsite to the URL, de.wikipedia.org for German, for example.

If you need to do such searches from many different applications under Windows, IntelliWebSearch might be a better choice for the preconfigured searches. I think it also handles a lot of tabs better, and it uses the ordinary browser setup instead of the more restricted options of memoQ's integrated mini-browser. I don't really like the fact that IWS keeps adding tabs to the browser, so I close it between searches, and to avoid messing up other work I am doing in Chrome (my default browser), I configure IWS to use another browser like Opera or Microsoft Edge.

Anyone who would like the light resource file for one of my German/English profiles for memoQ's web search can get it here. It includes the image search in Wikipedia and has a number of (mostly deactivated) custom search tabs useful for intellectual property translation. A few of the searches are for engines which require manual input of terms, but I find it convenient to have these on a tab for quick access.

Oct 21, 2016

A day in the life....

One of the things I enjoy most about professional translation is the range of activities and subject matters that one can encounter, even as a specialist in a few domains. I can't say the work is never boring, but when it does drift that way, very suddenly it isn't any more. Quite unpredictably.

Yesterday I typed translations. A bit more than expected after two sets of PowerPoint slides - a small one to translate from German and another to edit the rather acceptable English - turned out to have about 8,000 words of highly specialized slide notes about military command and control structures and the technology of fighting forest fires. (Note to self: no matter how busy you are, always import those presentations into memoQ with the options set to extract every kind of text as well as the bitmap graphics if you have to translate those too. Then do a word count! Appearances can be deceiving.)

Yesterday I dictated translations. The job started out as a bunch of text fragments from slides, where context über alles was the rule, lots of terminology required research, and voice recognition offered no particular advantages, then suddenly it became the translation of a rather long lecture using all that new terminology, and the deadline was tighter than thumbscrews operated by an angry ex-girlfriend. Dragon NaturallySpeaking to the rescue. Not only was this necessary to finish the text in a long workday rather than most of a week, but the more natural style of translation by dictation suited the purpose of the translated presentation particularly well. I could imagine myself in the room with equipment vendors, military commanders, firefighting specialists and freight forwarders, talking about the challenges faced and the technology required to avoid the tragedies of an out-of-control firestorm. And the words came out, transcribed from my voice directly into the target text fields of memoQ, exactly as they should be spoken to that audience. And at the end of that long day my hands still had feeling in them, which would not have been the case if I had typed even a third of the text.

Yesterday I made a specialized glossary to share with a presenter who will travel halfway around the world to lecture with the slides I translated for his talk. Long ago I discovered that the way I produce translations has the potential to provide additional benefits for those who will use my work. Sales representatives might need to write letters to their prospects, discussing their products in a language not mastered as a native, and the vocabulary from my work may help them to improve communication and avoid confusion that might result from using incorrect or simply different words to describe the same stuff. Or an attorney might need a quick overview of the language I used to translate the pleading she intends to file, to ensure that it is consistent with previous efforts and will not complicate discussions with her client. The terminology I research and record for each translation can be exported and reformatted quickly to produce glossaries or more complex dictionaries in a variety of formats suited for purpose. Little time and often a lot of benefits for my clients.

Yesterday I translated bitmap graphics and not only had to deal with the editing tools for that but also had to consider the best strategy for transforming the original German graphics into English ones. Would those charts be translated again into other languages? Would the graphics be re-used in other types of documents, so that I should consider ease of portability in my approach to the translation? And how the Hell do I actually use that new bitmap graphics transcription and substitution for Microsoft Office files which was added to memoQ some time ago and sort out the five charts to translate from the fifty to ignore? (Maybe I should blog the solutions some day.)

And yesterday I was asked to write summaries of large, badly scanned articles so that the equipment manufacturer would understand how its latest technology was discussed by German reviewers. As a kid I had a silly fantasy about getting paid to read, and this is just one of the many ways it unexpectedly came true. But before I get that far, these scanned files needed to be reworked so that they could be read and searched on the screen, so as I described in a guest post on another blog some years ago, I converted them to searchable PDF/A with ABBYY FineReader, which in this case also reduced their size by about 75%. The video below also shows how this works. Strangely, when I describe this procedure to other translators, many of them don't get it, and they go on about converting PDF files into editable MS Word files or plain text, or, God help them, something really stupid like importing PDF files directly into a CAT tool for translation, though none of this really relates to my purpose. Conversions often contain errors, and many texts are harder to interpret when the context of an accurate layout is lost. So "text-on-image" PDF files for translation reference to the original source files are often critical, and for files to summarize or consult sporadically for reference (with many pages to look at and essentially nothing to translate), a searchable PDF is the gold standard for efficient work.

In the course of that day I had to work with two computers linked by remote access using four networks at various time, working in German, English and Portuguese (the latter mostly involving questions to the housekeeper on how to do an online pizza delivery order so I could stay in the office and keep working). I used well over a dozen software applications for necessary tasks. These, and the environments in which they operate must be balanced carefully for efficient work. And even after some months in my new office, the balance isn't quite as good as I've had it before, and more attention to ergonomics is required.

Some colleagues are nostalgic for the "good old days" when they received a stack of paper to translate and sent off another stack of paper when the work was done, and they had a filing cabinet or a shelf of notebooks full of old work to use as reference material, and boxes of index cards stuffed full of scribbled notes on terminology next to seldom-dusty specialist dictionaries prepared by presumed experts, often full of marginalia commenting on errors or omissions and stuffed with papers bearing other scribbled notes. Not me. Since the day 30 years ago when I laboriously typed a text file full of file folder numbers and content descriptions for my research work and personal papers I have been a big believer in electronic retrieval of information wherever possible, and I miss retyping botched pages just as little as I miss the lines in the post office or the stress of dealing with delivery services.

I suspect that some feel a loss of control with the advent of new technologies in an old profession, and certainly the changes in the business environment for translation since the days of the typewriter often require a very different mentality to survive and thrive. What that mentality is, exactly, is a matter of healthy debate and often misunderstanding - again, because of the great diversity of the profession and the professions and unprofessionals in it.

The greatest challenges of new technologies that I find are the same as those faced in many other kinds of work and in modern life in general. Filtering the overabundance of input for the few things that are truly of use or interest and maintaining focus and calm amidst omnipresent distractions. Not relying too much on technologies that are far more fallible than most people, even experts, realize or acknowledge. And remembering that a fool with a tool, however many features and failsafes it may offer, remains a fool.

Mar 17, 2016

Dynamic filtering with regular expressions in memoQ

Regular expressions (aka regex) are not a tool for everyone, though this is something that the nerdily inclined often fail to appreciate. For average users, a plain language query interface, perhaps with more limited options, is generally more accessible and used. However, sometimes it's nice to have such "shortcuts" available to select particular structures in a text for translation or editing, and the many people who complained for years that Kilgray did not provide a dynamic regex filter for the working translation grid - a feature of SDL Trados Studio for quite a while now - did have a point worth addressing in development. Now that has happened, though still a bit incompletely when considered in the full scope of memoQ's usual features for selecting text.

memoQ uses regex in a number of its modules, and Kilgray has several webinars which describe these applications, though they require some stamina to watch, and I expect that most people will become hopelessly confused if they try to take in more than one area of application in a single sitting. The uses of regex for segmentation rules, tagging, autotranslatables and text filtering on document import (with the Regex Text Filter) are very different in their approach, even though the underlying syntax of the regex is the same. However, all of these applications allow the configured rules to be saved and re-used, so one could ask an expert to create the settings needed and provide these in a resource file, and many users do exactly that. Thus as long as one understand that regex can be used for a particular problem, the details can be hired out.

This new application of regex for dynamically filtering, introduced in recent builds of memoQ 2015, is a little different (at present). Although the Find/Replace dialog will "remember" regex syntax in its dropdown menu of recent expressions, there is no way to store these expressions, and they must be entered manually to use them. This means that, for now, the average user will have to collect useful expressions like a tourist might scribble phrases in a notebook to use on holiday in a foreign country, and those with a little more sense of adventure might find themselves with a hovercraft full of eels and wonder why.

One such phrase might be the example in the screenshot above. I was translating some financial statements with several formats present for digits in account numbers, dates and monetary expressions. In order to work more systematically with these various formats, I used several different regex expressions to sort and separate them. In the example I was looking for instances where at least four digits were written together in a source segment. That isn't terribly selective, but most of these occurrences in my documents were account numbers, and this helpfully cleaned up the text a lot and allowed me to work a little faster. Other expressions were used to QA date formats and monetary expressions more specifically.

In the working grid for translation and editing, regular expressions can be used in one or both of the fields for the source and target text when the checkbox in the toolbar at the right is marked. Or the regular expressions option in the Find/Replace dialog can be used.

It is somewhat disappointing that regex cannot be used to create static views at the present time. While marking can be used in the Find dialog to enable one to go back and forth between the filter criteria and other configurations of the working grid, there is no way to make a permanent "record" of the filtered segments. For quite a few years, I have wished for the possibility to save the results of my filtering in the working grid in some sort of view, but I was always able at least to recreate the filtering criteria in the dialog to create a memoQ View, which could then be opened at any time or exported in various formats for clients and project collaborators. However, at the moment that is not possible with regex filtering. (There are workarounds involving a change in segment status, but these are often inconvenient in a project in progress.)

The addition of regex filtering to the working grid in memoQ is a welcome feature for many, which I hope will be expanded by Kilgray in the future to achieve more of its potential. But to take advantage of this potential in any way, the average user will indeed need a "phrase book" of sorts, and an efficient way of managing useful collected regex snippets (and naming them for easier re-use in searches and filtering) would be very desirable. If these "regex phrase books" for dynamic filtering and view creation were able to be saved as shareable light resources, it would be possible to build many useful collections to help users at all levels in the translation, editing and quality assurance tasks.

Jan 8, 2014

Multiple, separate concordances with memoQ

In the comments of my recent post on the memoQ TM search tool, I mentioned a possibility for using that feature to "de-junk" and simplify concordance searches.

In the example above, for example, I am searching the 2 million translation unit EU DGT TM using text selected in a memoQ 6.2 project. Working this way offers me the following advantages:

I can separate the concordances for my project from a big reference dataset I only need for certain lookups.
A simple copy command (Ctrl+C) automatically looks up text in either language in the TM search tool.
If I want to avoid any possibility of unintended "leakage" of data from certain TMs in the project, selecting them for use in the memoQ TM search tool ensures that their content will never be "accidentally" inserted as an ordinary TM match as I work.

Note also that I am using a feature of memoQ 2013 R2 (6.8) to do searches while working in an older version of memoQ. I could do the same if I were working in a web translation interface for memoQ (which does not allow me to attach my own TMs) or any other translation environment.

I remember an argument with a translation agency owner about a year and half ago. The man told me quite insistently about his intent to force even translators with memoQ to use the web translation interface so that he could restrict them to the use of the client-specific TMs he maintained. With the use of the TM search tool, a reasonable compromise is achieved for TM data at least. (LiveDocs and termbase access remains a bit more cumbersome, however, though by setting up a dummy project with termbases, corpora and particular TMs attached, one could actually use three separate concordance sets. That could be interesting.)

In any case, the possibility of a separate concordance for handling large data volumes separately from one's main TMs and the possibility of doing this even while using older versions of memoQ may be a reason why those who do not yet want to do their routine work in the latest version or cannot do so can still benefit from upgrading now and installing the latest version alongside the old version(s).

Jan 2, 2014

The memoQ TM search tool

Release 2 of memoQ 2013 included a new utility which allows memoQ translation memories to be used for lookups, the TM search tool:

When working in other translation environment tools such as SDL Trados Studio or Wordfast, translating text in a word processor or reading PDF files and web pages, selected text can be looked up directly in chosen translation memories and text from the source or target of a translation match can be put in the Clipboard for pasting into the other application. Relevant keyboard shortcuts are:

Ctrl+Shift+Q	Starts the memoQ TM search tool, immediately searches for any text on the Windows Clipboard.
Ctrl+C	Copies new text to the TM search window and executes a search.
Ctrl+Alt+C	Copies the target text of a selected match to the Clipboard.
Ctrl+Shift+C	Copies the source text of a selected match to the Clipboard.
Ctrl+V	Pastes the Clipboard text into another application.

A translation memory selected in the TM search tool cannot be opened or used in memoQ while the search tool is active. An orange lightning bolt is displayed in the TM list of the Search settings to indicate this status. After the search tool is closed, the TM is available again for use in memoQ.

Although the initial version of this tool is quite useful, many users have realized that further refinements of its features would make its application more flexible and effective. Some suggestions so far include

selecting/deselecting all TMs
filtering TMs by metadata
saving and loading profiles (collections of particular TMs and settings)
indicating match sources (i.e. TM, preferably with metadata)

A number of other quirks, like the ability to launch multiple instances of the tool, also still need to be sorted out as of Build 52.

I hope that Kilgray will take the further development of this tool seriously and consider how to improve and expand it, perhaps to include remote translation memories as well. The current version of the TM search tool requires a memoQ license on the computer where it is used, but separate licensing could also be quite interesting. This could be useful, for example, in collaborative projects with partners who use different tools and working methods or for those who want to use memoQ translation memories as bilingual concordances. I see the potential for a value-added service here if I can provide such a concordance (for a fee) to an end client, perhaps with some sort of protective encapsulation for the memories provided. Inclusion of termbases and LiveDocs corpora in future versions of the tool could also prove interesting. memoQ could become a reference information packaging platform to create additional communication services for our clients. There are interesting possibilities for mobile applications here as well. But in the meantime I'll settle for the modest improvements in the bullet points above.

Further information on the memoQ search tool can be found in the Kilgray knowledgebase.

Search me!