Translation Tribulations: index

Showing posts with label index. Show all posts

Sep 11, 2018

Adding time codes to YouTube videos

For years now, I have advocated the use of tables of contents for long instructional videos, recorded webinars and suchlike. I saw these in a few instances, but it was never clear how the indices were produced, so I suggested merely writing a list of relevant points and their play times and scrolling manually. Understandably, not many adopted this suggestion.

Then I discovered that my video editor (Camtasia) could create tables of contents for a video automatically when creating a local file, an upload to YouTube or other exports if timeline markers were added at relevant points. The only disadvantage for me with this approach was the limit on the length of the descriptive text attached to the markers. Worse than Twitter in the old days.

But when I accidentally added a marker I didn't want and removed it from the YouTube video description (which is where a TOC resides on YouTube), I saw that things were much simpler than I imagined. And a little research with tutorials made by others confirmed that any time code written at the beginning of a line in the video's description will become a clickable link to that time in the video.

So I've begun to go through some of my old videos with a text editor opened along side. When the recording gets to a point that I want to include in the table of contents, I simply pass the cursor over the video, take note of the time, and then write that time code into the text file along with a description of any length.

Afterward, I simply paste the contents of that text file into the description field in YouTube's editor. When the Save button at the top right is clicked, the new description for the video will be active, and viewers can use the index to jump to the points they want to see. Because only a few lines of the description text are visible by default, I include a hint at the beginning of the text to let people know that the live table of contents is available if they click the SEE MORE link.

If Kilgray, SDL, Wordfast and others involved with the language services sector would adopt techniques like this for their copious recorded content on the Web, the value and accessibility of this content would increase enormously. It would also be very simple then to create hot links to important points in other environments (PowerPoint slides, PDF files, etc.) to help people get to the information they need to learn better.

Not to do this would truly be a great waste and a shame in many cases.

Oct 15, 2015

The Invisible Hand of memoQ LiveDocs - making "broken" corpora work

Last month I published a post describing the "rules" for document visibility in the list of documents for a memoQ LiveDocs corpus. Further study has revealed that this is only part of the real story and is somewhat misleading.

I (wrongly) assumed that, in a LiveDocs corpus, if a document was visible in the list its content was available in concordance searches or the Translation Results pane, and if it was not shown in the list of documents for the corpus in the project, its content would not be available in the concordance or Translation Results pane. Both assumptions proved wrong in particular cases.

In the most recent versions of memoQ, for corpora created and indexed in those versions, all documents in a corpus shown in the list will be available in the concordance search and the Translation Results pane as expected. And the rules for what is currently shown in the list are described accurately in my previous post on this topic. However,

if there are documents in the corpus which share the same main language (as EN-US and EN-UK both share the main language, English) but are not shown in the list, these will still be used for matching in the memoQ Concordance and Translation Results and
if the corpus was created in an older version of memoQ (such as memoQ 2013R2), documents shown in the list of a corpus may in fact not show up in a Concordance search or in the Translation Results.

This second behavior - documents shown in the list but their content not appearing in searches - has been described to me recently by several people, but it could not be reproduced at first, so I thought they must be mistaken, and statements that "sometimes it works and sometimes it doesn't" made these pronouncements seem even more suspect. Except that they happen to be true and I now (sort of) understand why.

Prior to publishing my post to describe the rules governing the display of documents for a LiveDocs corpus in a project, I had been part of a somewhat confusing discussion with one of my favorite Kilgray experts, who mentioned monolingual "stub" documents a number of times as a possible solution to content availability in a corpus, but when I tried to test his suggestion and saw that the list of documents on display in the corpus had not expanded to include content I knew was there, I thought he was wrong. But actually, he was right; we were talking about two different things - visibility of a document versus availability of its content.

For purposes of this discussion, a stub document is a small file with content of no importance, added only to create the desired behavior in memoQ LiveDocs. It might be a little text file - "stubby.txt" - with any nonsense in it.

I went back to my test projects and corpora used to prepare the last article and found that in fact for the main languages in a project all the content was available from the corpora, regardless of whether the relevant documents were displayed in the list. In the case of a corpus not offered in the list for a project because of sublanguage mismatches in the source and target, adding a stub document with either a generic setting (DE, EN, PT, etc.) or sublanguage-specific setting for the source language or the correct sublanguage setting for the target (DE-CH, EN-US, etc.) made all the corpus content for the main languages available instantly. (In the project, documents added will have the project language settings; use the Resource Console for any other language settings you want.)

Content of a test corpus before a stub document was added. Viewed in the Resource Console.

The test corpus with the document list shown in my project; only the stub document is displayed, but
all the indexed content shown above is also available in the Concordance and Translation Results.

It is unfortunate that in the current versions of memoQ the document list for a corpus in a project may not correspond to its actual content for the main languages. Not only does this preclude accessing a document's content without a match or a search, it also means that binary documents (such as one of the PDF files shown in the list) cannot be opened from within the project. I hope this bug will be fixed soon.

Since a few of my friends, colleagues and clients were concerned about odd behavior involving older corpora, I decided to have a look at those as well. Kilgray Support had made a general recommendation of rebuilding these corpora or had at least suggested that problems might occur, so I was expecting something.

And I found it. Test corpora created in the older version of memoQ (2013 R2) behaved in a way similar to my tests with memoQ 2015 - although the "display rules" for documents in the list differed as I described in my previous blog post, the content of "hidden" documents was available in Concordance searches and in the Translation Results pane. But....

When I accessed these corpora created in memoQ 2013 R2 using memoQ 2015, even if I could see documents (for example, a monolingual source document with a generic setting), the content was available in neither the Concordance nor the Translation Results until I added an appropriate stub document under memoQ 2015. Then suddenly the index worked under memoQ 2015 and I could access all the content, regardless of whether the documents were displayed in the list. If I deleted the stub document, the content became inaccessible again.

So what should we do to make sure that all the content of our memoQ corpora are available for searches in the Concordance or matches in the Translation results?

If you always work out of the same main source language (which in my case would be German or "DE", regardless of whether the variant is from Germany, Austria or Switzerland), then add a generic language stub document for your source language to all corpora - old and new - under memoQ 2015 and all will be well.

If your corpora will be used bidirectionally, then add a generic stub for both the source and target to those corpora or add a "bilingual stub" with generic settings for both languages. This will ensure that the content remains available if you want to use the corpora later in projects with the source and target reversed.

Although it's hard to understand the principles governing what is displayed, when and why, following the advice in the red text will at least eliminate the problem of content not being available for pretranslation, concordance searches and translation grid matches. And the mystery of inconsistent behavior for older corpora appears to be solved. The cases where these older corpora have "worked" - i.e. their content has been accessible in the Concordance, etc. - are cases where new documents were added to them under recent versions of memoQ. If you just keep adding to your corpora, doing so particularly from a project with generic language settings, you'll not have to bother with stub documents and your content will be accessible.

And if Kilgray deals with that list bug so we actually see all the documents in a corpus which share the main languages of a project, including the binary ones, then I think the confusion among users will be reduced considerably.

Aug 15, 2013

Comments on memoQ comments and YouTube playlists

I recently produced a small video tutorial on what I feel are the useful aspects of the comment feature in memoQ 2013. Although quite a few new things have been introduced to commenting in the current version of the software, the real significance of these changes for ordinary users of the software is limited. Now that what was broken in the memoQ 2013 release is largely fixed, those who care about comments for offline use can continue to use this great feature without much inconvenience.

Here is an idiosyncratic overview of how I use comments in my projects (HINT: these embedded videos are easier to watch if you do that in full screen mode by clicking the icon at the lower right of the play window):

Time Description
0:28 Opening the comment dialog
1:01 Commenting highlighted text
1:48 Adding "codes" to comments for later filtering
2:44 Selecting all files, creating a view with all comments
3:35 Comments shown in speech bubble tooltips
3:57 Creating a filtered list of comments (code = '@PM')
4:50 Creating a filtered list of comments (code = '@CST')
5:20 Exporting commented segments in a bilingual RTF file
6:15 Check segments for extraneous comments before sharing the exported list

What I didn't show here is my usual way of accessing and exiting the comment dialog: keyboard control, opening with Ctrl+M and exiting with a quick tab to the OK button and hitting the Enter key. Having multiple comments makes editing slightly less convenient if one has to click on an icon, but the ease of deleting an entire comment in a series, and the separation of comments by a new paragraph in an exported RTF bilingual file are compensating conveniences.

That six-and-a-half minute video really has more information than someone generally familiar with the old way of using comments in memoQ would care about. The only part which might really interest someone who already knows how to create an exportable view with commented segments is how the procedure for creating views of selected comments differs from creating a view with all comments. So I decided to use the excerpting feature of YouTube playlists to create a special "view" of the tutorial video which shows only that little bit from which I believe many experienced users may benefit.

(Use the link above to look at the trimmed playlist on YouTube - I removed the embedded video code here, because its behavior in the Google Blogger environment seems to be quirkier than links on a Moodle web page or Facebook page. This technique is useful but may still require careful testing of the environment in which it will be used.)

No index needed here - the video is barely over a minute long in its two parts. This technique of playlist excerpting on YouTube could be used to "mine" longer teaching videos for specific bits of information needed to understand a specific issue. One can combine separate video clips, "in whole or in part" as contract lawyers like to say, or individual segments of a single video as I have done here. This is a useful technique which, along with time index lists such as that shown above, I hope to see applied more often for the education and support of translators.

How does Kilgray present the state of commenting in memoQ? The video clip below is from Kilgray's YouTube channel - interesting, but really another world. The video shows the commenting feature as it was at the end of May, with "innovations" which sparked the Commentgate controversy.

This presentation is really very focused on users of the memoQ server, because all that lovely highlighting is only visible in the memoQ environment itself, and these comments with highlighting do not currently export in the usual medium for sharing feedback (comments) with clients offline: RTF bilingual files. In fact, it's really a shame that in all the years that memoQ has offered exportable comments, this very helpful feature has hardly been part of the official teaching, because in the real world of client relationships, it is often a great asset.

Jun 16, 2012

memoQuickie: footnote, cross-reference & index entry segmentation in Microsoft Word files

If you have a Microsoft Word DOC file or RTF to translate, it is important to be aware of the different behaviors of the memoQ import filter options you can use. If there are footnotes, cross-references or index entries, it is far better to use the option to import the DOC or RTF file as DOCX.

The DOC file shown below has a footnote, a cross-reference and an index entry: