Showing posts with label IntelliWebSearch. Show all posts
Showing posts with label IntelliWebSearch. Show all posts

Jul 31, 2019

URL-based searches of your Google Drive


Just before a recent short holiday, I ran across an article from 2017 which described how to search Google Drive directly from Chrome's address bar. "Interesting," I thought, and with the possibility of integrating such Google Drive searches with IntelliWebSearch or memoQ's integrated web search feature (or similar features in other environments) in mind, I shared the link with a few friends.

Google Drive and its application suite, which includes GoogleDocs (the word processor) and Google Sheets (the spreadsheet application), offer many possibilities for helping in language projects, collaborative and otherwise. I have written extensively about these possibilities with terminology (here, for example, and in a number of related articles). But these earlier investigations involved specific documents and viewing these - or selected portions of them - in a web browser window. Searching a number of files of various types on one's Google Drive ("My Drive") or a subfolder thereof is a little different. Possibly more useful in some circumstances, such as in a group project where multiple participants are contributing to a shared reference folder (though this folder will have to be added to the "My Drive" of each collaborator).

Google's Help for the relevant search function explains:
You can find files in Google Drive, Docs, Sheets, and Slides by searching for:
  • File title
  • File contents
  • Items featured in pictures, PDF files, or other files stored on your Drive
You can only search for files stored in My Drive. Files stored in folders shared with you won't appear in your search unless you add the folders to My Drive.
 
You can also sort and filter search results.
It all starts with a basic URL, such as
https://drive.google.com/drive/search?q=SOMETEXT
Execute that in your browser's address bar, replacing the SOMETEXT with your desired search expression, and you'll get a hit list of all files on your Google drive which include that text in the title or contents. In a tool like memoQ Web Search, it is substituted by the placeholder for search text that the application uses (that is {} in the case of memoQ Web Search). With a little experimentation, you'll soon find the additional arguments to search specific file types or folders.

For example, if I want to do a search in the "Other" subfolder on my Google drive, I can discover the URL arguments by starting a manual search and just reading the address bar:


The parameter to use for a specific folder search is "parent", followed by a colon and the coded ID of that folder.


An example of a folder search with a specific text segment is in the screenshot above; this was taken while configuring and testing the search in a memoQ Web Search profile. One document containing the search text "turnip" was found in the folder. To view the document, right-click on it in the hit list and choose Preview.

Search inside the preview of a document found in a Google Drive search with memoQ Web Search

Unfortunately there seems to be a bug in the memoQ Web Search - which now uses Chromium - because double-clicking the document tries to open it in the old search engine based on Internet Explorer, where I was not logged in to Google.

An Internet Explorer window, bizarrely launched by the Chromium-based memoQ Web Search

In fact, you'll have to log in to Google each time you open the memoQ Web Search window (a total nuisance), so it's better to leave it open in the background, even though the current bug in which the web search window is no longer brought to the forefront can make this inconvenient. In other tools this may not be an issue.


The Chromium/IE issue as well as the focus and login hassles with memoQ's web search have been reported to memoQ Support; I look forward to seeing how these are handled. Nonetheless, this Google Drive search seems to have significant potential for individuals and teams to build searchable document collections in the folders of a Google Drive account. Try it in your working environment and share your findings!

Jun 11, 2019

Ergonomic optimization for memoQ windows & more!

Click the graphic to see the mind-blowing details of all you can get on two silly little screens. Imagine two big ones!

How many functional windows do you see for working in the memoQ project of the screenshot here? Do you need more? It's possible. Are you familiar with all the functions shown in this two-screen view of my laptop and a repurposed television screen on my working holiday?

Of course one need not be restricted to just the many undockable, resizable and relocatable windows of memoQ; other, third-party like the SDL MultiTerm Widget (for searching SDL term bases in memoQ or other applications) or IntelliWebSearch, which offers many customized, configurable multi-tab web searches with the browser engine of your choice, or others can be added as needed.

"But wait!", you say. "You can't undock memoQ windows except for the preview, and it's impossible to get enough space to see all the information in the Translation Results pane or see the comparison of large matches well!" Well, here you can. The Translation Results hit list can take the whole height of your screen if you want it to. And you can even see more than one translation and editing grid for files if you need to.

Just because memoQ Support or some expert in the company says stuff like that is not possible doesn't make it so. For something like a decade now I have heard users ask for a lot of layout customization features to improve working ergonomics in memoQ. Heck, I've heard myself beg for that for ages. But typically, one is told how difficult and expensive such efforts are, how there are other priorities, yada yada yada. What, apparently, nobody realized was that while all these discussions were going on, someone actually implemented the requested features, deliberately or otherwise. In any case, somehow that secret never got out. Until I stumbled over it last week while trying to enjoy a few days at the beach.

"How do I get there?" you and David Byrne may ask. Join us for the Best Practices in Translation Technology course from 15 to 20 July (next month) in Lisbon and find out! Or wait until I get around to opening my upcoming online courses, Working Ergonomics in memoQ and New Beginnings with memoQ 9.0, coming soon. Or look in all those memoQ basics tutorials from memoQ Translation Technologies Ltd. on YouTube - something as basic as ergonomics for using the software must be in there somewhere. Or maybe not. Yet.

Or... explore and discover the tricks yourself. And while you're at it, you might find some of the other hidden surprises cleverly concealed in the world's greatest translation environment toolkit.

Dec 29, 2018

memoQ Terminology Extraction and Management

Recent versions of memoQ (8.4+) have seen quite a few significant improvements in recording and managing significant terminology in translation and review projects. These include:
  • Easier inclusion of context examples for use (though this means that term information like source should be placed in the definition field so it is not accidentally lost)
  • Microsoft Excel import/export capabilities which include forbidden terminology marking with red text - very handy for term review workflows with colleagues and clients!
  • Improved stopword list management generally, and the inclusion of new basic stopword lists for Spanish, Hungarian, Portuguese and Russian
  • Prefix merging and hiding for extracted terms
  • Improved features for graphics in term entries - more formats and better portability
Since the introduction of direct keyboard shortcuts for writing to the first nine ranked term bases in a memoQ project (as part of the keyboard shortcuts overhaul in version 7.8), memoQ has offered perhaps the most powerful and flexible integrated term management capabilities of any translation environment despite some persistent shortcomings in its somewhat dated and rigid term model. But although I appreciate the ability of some other tools to create customized data structures that may better reflect sophisticated needs, nothing I have seen beats the ease of use and simple power of memoQ-managed terminology in practical, everyday project use.

An important part of that use throughout my nearly two decades of activity as a commercial translator has been the ability to examine collections of documents - including but not limited to those I am supposed to translate - to identify significant subject matter terminology in order to clarify these expressions with clients or coordinate their consistent translations with members of a project team. The introduction of the terminology extraction features in memoQ version 5 long ago was a significant boost to my personal productivity, but that prototype module remained unimproved for quite a long time, posing significant usability barriers for the average user.

Within the past year, those barriers have largely fallen, though sometimes in ways that may not be immediately obvious. And now practical examples to make the exploration of terminology more accessible to everyone have good ground in which to take root. So in two recent webinars, I shared my approach - in German and in English - to how I apply terminology extraction in various client projects or to assist colleagues. The German talk included some of the general advice on term management in memoQ which I shared in my talk last spring, Getting on Better Terms with memoQ. That talk included a discussion of term extraction (aka "term mining"), but more details are available here:


Due to unforeseen circumstances, I didn't make it to the office (where my notes were) to deliver the talk, so I forgot to show the convenience of access to the memoQ concordance search of translation memories and LiveDocs corpora during term extraction, which often greatly facilitates the identification of possible translations for a term candidate in an extraction session. This was covered in the German talk.

All my recent webinar recordings - and shorter videos, like playing multiple term bases in memoQ to best advantage - are best viewed directly on YouTube rather than in the embedded frames on my blog pages. This is because all of them since earlier in 2018 include time indexes that make it easier to navigate the content and review specific points rather than listen to long stretches of video and search for a long time to find some little thing. this is really quite a simple thing to do as I pointed out in a blog post earlier this year, and it's really a shame that more of the often useful video content produced by individuals, associations and commercial companies to help translators is not indexed this way to make it more useful for learning.

There is still work to be done to improve term management and extraction in memoQ, of course. Some low-hanging fruit here might be expanded access to the memoQ web search feature in the term extraction as well as in other modules; this need can, of course, be covered very well by excellent third-party tools such as Michael Farrell's IntelliWebSearch. And the memoQ Concordance search is long overdue for an overhaul to allow proper filtering of concordance hits (by source, metadata, etc.), more targeted exploration of collocation proximities and more. But my observations of the progress made by the memoQ planning and development team in the past year give me confidence that many good things are ahead, and perhaps not so far away.

Jun 25, 2017

NOW is not the National Organization of Words...

... but with over 4 billion of them, that interpretation of the News on the Web corpus at Brigham Young University would be plausible. BYU is known for its high quality research corpora available to the public. The news corpus grows by about 10,000 articles each day, and its content can be searched online or downloaded.

The results are displayed in a highlighted keyword in context (KWIC) hit list with the source publications indicated in the "CONTEXT" column:


As a legal translator, I find the BYU corpus of US Supreme Court Opinions more useful. It displays results in a similar manner:


It is difficult or impossible to configure a direct search in these corpora using memoQ Web Search, IntelliWebSearch or similar integrated web search features in translation environments. However, these tools can be used as a shortcut to open the URL, and the search string can be applied once the site has been accessed. Since I perform searches like this to study context infrequently, a standalone shortcut with IWS serves me best; if I were using this to study usage in a language I don't master very well, like Portuguese (yes there is a Portuguese corpus at BYU - actually, two of them, one historical), then I might include the URL in a set of sites which open every time I invoke memoQ Web Search or a larger set of terminology-related sites in an IntelliWebSearch group.

One great benefit of using such corpora as a language learner, is that context and collocations (words that occur together with a particular word or phrase) can be studied easily, better than with dictionaries, enabling one to sound a bit less like an idiot in a second, third, fourth or fifth language. Or for many perhaps, even their first language :-)

Jun 6, 2017

Build your own online reference TM for a team or anyone!


In the past, I have published several articles describing the use of free Google Sheets as a means of providing searchable glossaries on the Internet. This concept has continued to evolve, with current efforts focused on the use of forms and Google's spreadsheet service API to provide even more free, useful functionality.

On a number of occasions I have also mentioned that the same approaches can be used for translation memories to be shared with people having different translation environments, including those working with no CAT tools at all. However, the path to get there with a TM might not be obvious to everyone, and the effort of finding good tools to handle the necessary data conversions can be frustrating.

I've put up a demonstration TM in Portuguese and English here: https://goo.gl/LXXgmf

Here is a selection from the same data collection, selecting for matches of the Portuguese word 'cachorro':  https://goo.gl/9KJils
This uses the same parameterized URL search technique described in my article on searchable glossaries.

A translation memory in a Google Sheet has a few advantages:
  • It can be made accessible to anyone or to a selected group (using Google's permission scheme)
  • It can be downloaded in many formats for adding to a TM or other reference source on a local computer
  • Hits can also be read in context if the TM content is in the order it occurs in the translated documents. This is an advantage currently offered in commercial translation environment tools only by memoQ LiveDocs corpora.
Web search tools of many kinds can be configured easily to find data in these online Google Sheet "translation memories" - SDL Trados Studio, OmegaT and memoQ are among those tools with such facilities integrated, and IntelliWebSearch can bridge the gap for any environment that lacks such a thing.

But... how do you go from a translation memory in a CAT tool to the same content in a Google Sheet? This can be confusing, because many tools do not offer an option to export a TM to a spreadsheet or delimited text file. Some suggestions are found in an old PrAdZ thread, but I found a more satisfactory way of dealing with the problem.

A few years ago, the Heartsome Translation Studio went free and Open Source. It contains some excellent conversion tools. I downloaded a copy of the Heartsome TMX Editor (the available installers for Windows, Mac and Linux are here) and used it to convert my TMX file.




The result was then uploaded to a public directory on my personal Google Drive, and the URL was noted for building queries. Fairly straightforward.

The Heartsome TMX Editor seems like it might be a useful tool to replace Olifant as my TMX editor. While the TM editor in my tool of choice (memoQ) has improved in recent years, it still does not do many things I require, and some of this functionality is available in Heartsome.

May 23, 2017

IntelliWebSearch: really the best Windows-based search tool for translators.

When I began using Michael Farrell's IntelliWebSearch (IWS) about a year ago, shortly before a few IAPTI webinars on that subject, I was impressed with the tool's flexibility, but one thing drove me nuts: the browser kept adding tabs with each search, unlike the tool I favored at the time, memoQ Web Search. But the latter is restricted to use within memoQ, so I had some hope of sorting out the problem with IWS.

I asked the program's author for a solution, but I think I failed to articulate the problem properly: I was told that this was simply a shortcoming I would have to live with. Not true. Michael's tool is better than he said.

The solution turned out to be in the program's settings, which are accessed under the Edit menu.


An example of "improved" settings more to my taste is above. The important thing for me to get the behavior I wanted was to define the return behavior. Use the return shortcut and close the browser. Subsequent actions can include pasting any copied text if you like.

Of course, adding extra tabs to the open browser is not such a bad thing in some cases, providing a sort of tab-based "history" of the searches. And simply using the search window shortcut opens an IWS window with text copied to a search field, where individual searches can be launched in the browser of choice using icons for various configured searches.

The much greater flexibility of IntelliWebSearch, its universal application in any Windows software, its memory stability (memoQ Web Search has had a serious memory leak for a long time, resulting in crashes and other troubles) and its very modest price for licenses after a 2-month trial makes it my search tool of choice now that I can get the browser window behaviors I want. And various "profiles" for searching can be saved in external files for backup and sharing with others.

For educational and professional use, this is a superb choice. The program can also be linked to local information, such as CD-based dictionaries or desktop search tools. Check it out!

Your working software tools as Xbox "games" in Windows 10!

For the last few days I have been away from the office, working from home on a relatively new laptop which doesn't have a lot of the software installed that I use on my main machine. Then today when I needed to make a screen recording to document a memory leak in one of my software tools, I was annoyed to realize that Camtasia wasn't installed on the laptop and I had to find some other means of video capture.

That was when I found out about the nice little video recording tool included in a somewhat obscure way with the Windows 10 operating system. When invoked for the first time in an application, such as memoQ, the Windows Task Manager or anything else, you'll be asked if the program you are running is a game. Lie and click Yes, this is a game.



The recording bar invoked with the Windows-G key looks like this:


Continuous recordings can be made for long periods of time, but the really cool feature of this recorder is that it can be set up to maintain a history of a defined period just passed and save this history as an MP4 video file.


The default is 30 seconds; in the screenshot above, the backward recording buffer is set to three minutes.

What good is this? Well, one thing you can do is record a retroactive video after the program you use crashes. This can then be submitted to support experts to help them figure out what went wrong, or you can review the recording yourself to see what was done.

The videos are stored in the default path for Videos in a folder named Captures:


A very boring example of this is shown below; it shows the activity in the Windows Task Manager as I launch various applications. The results showed me the steady increase in memory consumption by the memoQ Web Search feature (amounting to over several gigabites after perhaps 20 minutes, leading to crashes and/or other problems) versus exactly the same search in 5 tabs of Internet Explorer using IntelliWebSearch. The latter is rock stable in its memory use, causing no problems at all and offering much greater flexibility, which is why I strongly recommend this search productivity tool, which can be accessed from any Windows application.


Dec 27, 2016

Free shareable, searchable glossaries for collaboration with anyone

Some years ago I suggested a procedure using Google spreadsheets for glossary collaboration in projects. Many people do this sort of thing now.

What I do not think most are doing, however, is accessing these web-based term lists efficiently as terminology resources in their work. It's hard to compete with the efficiency of integrated termbases, TMs, web search features, etc.

... unless of course you integrate a web search for those online spreadsheets which returns just the few data of interest.

Matches found for German "ladepresse" in a glossary of a few thousand hunting terms
This is fairly straightforward using Google's visualization API with a simple query. A parameterized URL can be built to perform custom searches of your own data or data shared by colleagues or clients. "Canned" queries can be easily incorporated in custom searches from many tools, including memoQ Web Search, IntelliWebSearch and others.


Building a custom search URL for your Google spreadsheet is fairly simple. In the example above it consists of three parts:

{base URL of the spreadsheet} + /gviz/tq?tqx=out:html&tq= + {query}

The red bit invokes the Google visualization API and specifies that the query results be returned as HTML (for display in a browser). The query language is similar to SQL, but if you use a prepared query for a given spreadsheet table structure, you don't need to learn any of that. Queries can be made which also return definitions, images, context examples or anything else that might reside in columns of interest in the online spreadsheet.

Using a tool like IntelliWebSearch or integrated extensions of OmegaT, memoQ and other tools, users working with any sort of tools can share a live glossary. Google Spreadsheets also have some permissions/security features which can be investigated if needed.

Of course other data can be shared this way, including TMs or XLIFF data as well as monolingual information. A little study of the relevant Google documentation reveals many possibilities :-)

Getting the picture with automated web searches

Like many other translators, I have come to appreciate the value and the complications of Internet searches in my work. As the garbage accumulated on the World Wide Web grows ever deeper, focused searches are more important than ever to get past the noise to find the information required, then get back to work.

Integrated tools for focused searches on multiple web sites are popular with many. IntelliWebSearch (IWS), memoQ Web Search and similar tools can be an enormous boost to productivity. But I doubt that many people give much thought to optimizing that possibility in general or for particular jobs.

Google searches are very popular. The Advanced Search features are particularly useful. For example, I find translating Austrian legal texts to be difficult sometimes, because an ordinary Google search of relevant legal terms yields too much interference from sites in other German-speaking countries. However, a search configured like this:

https://www.google.com/search?as_epq=schwerer+Betrug&lr=lang_de&as_sitesearch=www.jusline.at

will yield only results in German from the Austrian site Jusline, which is very helpful if I am looking for the specific definition of "schwerer Betrug" in the jurisprudence of that country.

Similarly, a financial translator working with Austrian texts might use a search like

https://www.google.pt/search?as_epq=Umlage&as_sitesearch=www.afrac.at

In my technical work, very often I must look for images of a component or process described. For a long time I did this inefficient: searched Google and then clicked the Images link and waded through the chaos to find what I needed. But if I am translating the catalog of the hunting supplier Frankonia, that's stupid. I can do a very specific search like this:

https://www.google.com/search?q=wildbergehaken+site:frankonia.de&tbm=isch

which will open a Google Images search directly (that's what the argument tbm=isch does), using only pictures culled from the site of the retailer whose material I am working on.

An image search using Wikipedia.org can often be very helpful to identify an unknown term and navigate to related articles in various languages. For example, a person encountering an unknown word in Russian might use this search:

https://www.google.com/search?as_epq=собака&as_sitesearch=wikipedia.org&tbm=isch

and quickly see what the term is about.


The search results above were obtained with memoQ Web Search, where I have the Wipikedia image search preconfigured:


Astute readers may notice the slight difference in syntax between the search in the screenshot and the Russian example I gave. There is more than one way to skin a cat with web searches. Or a dog in this case. To restrict searches to the wiki for one particular language just add the prefix for that subsite to the URL, de.wikipedia.org for German, for example.

If you need to do such searches from many different applications under Windows, IntelliWebSearch might be a better choice for the preconfigured searches. I think it also handles a lot of tabs better, and it uses the ordinary browser setup instead of the more restricted options of memoQ's integrated mini-browser. I don't really like the fact that IWS keeps adding tabs to the browser, so I close it between searches, and to avoid messing up other work I am doing in Chrome (my default browser), I configure IWS to use another browser like Opera or Microsoft Edge.

Anyone who would like the light resource file for one of my German/English profiles for memoQ's web search can get it here. It includes the image search in Wikipedia and has a number of (mostly deactivated) custom search tabs useful for intellectual property translation. A few of the searches are for engines which require manual input of terms, but I find it convenient to have these on a tab for quick access.

Aug 23, 2016

Reminder: web search tutorial this Friday!


Time is running out to register for Michael Farrel's webinar this Friday on the basics of IntelliWebSearch, a scripting tool that runs under Windows and enables multiple, simultaneous web searches using text selected in any application.

I used to be rather sceptical of this sort of tool, but in the past several years (since a similar, less powerful feature was introduced in memoQ) I have found this to be among the greatest contributors to me research and translation productivity. This saves time and reduces my work fatigue over the course of a long day.

The online workshop is free to IAPTI members and very affordable to everyone else (USD 25 or a bit less if you are a member of a partner association.

There will be a more advanced presentation to follow in September, which does not require participation in this one, but which does assume that you know the basics of IWS.

Jun 15, 2016

Better ways to search the Internet while translating

When Kilgray introduced memoQ Web Search a few years ago, I was unimpressed, because I was fairly efficient at working with the several tabs of my favorite sites to search for information during translation projects and I couldn't imagine much value to be had from an "integrated" search in a stripped-down custom browser. And the buggy example templates shipped with the memoQ release (several search setups are incorrect) didn't help much. It was only when I began to take a careful, systematic look at this feature to document it for my memoQuickies guide to configuration that I realized how straightforward it really was, and since then, despite ongoing bugs in the feature which sometimes lead to crashes, it has become one of the most important practical features of the product for me. Being able to select a text in the source or target and hit a hot key to search multiple sites at once really does save me time. Lots of it.

There has, of course, been another product around which does that too, which works in more or less any Windows environment and which is far more configurable. But when I took my first look at IntelliWebSearch (IWS) years ago, I was put off and confused by the nerdishness of the presentation, and I just wasn't ready to be told how many damned options I had when I was trying to get my head around the simplest basics. Recently, however, I have beome increasingly irritated at little things with memoQ Web Search, including the impossibility of adding my user credentials to turn off ads on some sites I use, and I began to wonder if IWS might not be worth another look. And indeed it is.

 Go to the IWS web site and learn more!

I am a big fan of multiple concordance search sets in integrated translation environments- this isn't a feature in any of them as far as I know, but it is accomplished easily enough. In memoQ I use the integrated concordance feature to search all the translation memories and corpora attached to a project for my "primary" concordance search set. The memoQ TM Search Tool is configured to search another set of TMs, including some with languages other than those in my project. And for blockbuster TM concordance searches with massive resources like the DGT data sets I have TMLookup. That is three differently configured concordance searches available at a keystroke in my working environment, and you can do the same thing in your favorite CAT tool as well.

So why not try this with web searches? Searching too many tabs at once tends to be slow, which is why I generally recommend no more than a few favorites be used with the integrated memoQ feature. IWS, unlike my usual tool, offers the possibility of configuring multiple "search groups", all of which can be accessed from anywhere with a hot key combination you assign. So I tried this for a few special sites that I usually don't want in my memoQ Web Search but which are a nuisance to deal with manually when I need them. It took me just a few minutes to install IWS, and with the help of a couple of short tutorial videos on the tool's web site, I had my custom searches set up(the configuration wizard to do this is dead easy and user friendly), and in about 10 minutes I was happily invoking special web searches on multiple tabs of my default browser (where I have configured some sites to shut off the damned ads) while I worked in the memoQ translation and editing grid.

So IntelliWebSearch today isn't nearly as difficult to figure out as it seemed years ago. Whether the product has improved or my head is just a little less cluttered now I'm not sure. But it's a very useful tool and a good extension of my working environment which I can recommend with more confidence. I can do useful things without drowning in the depth of its product features.

Nonetheless, I wouldn't mind a guided tour from someone with more of a clue than I have. And later this summer I have exactly such an opportunity. Colleague Michael Farrell, the Italian to English translator who created IntelliWebSearch for all of us, is giving two webinars sponsored by IAPTI, which offer a thorough grounding in the basics of this productivity booster. Information with links (click the images!) is below. I'll be there and hope you'll join me and learn useful things to help in your research of monolingual and multiplingual information sources on the Internet.

 Registration for the webinar

 Registration for the second webinar