Jun 29, 2013

Caption editing for YouTube videos

I've spent a great deal of time in recent weeks examining different means for remote instruction via the Internet. In the past I've had good success with TeamViewer to work on copywriting projects with a partner or deliver training to colleagues and clients at a distance. So far I have avoided doing webinars because of the drawbacks I see for that medium, both as an instructor and as a participant, but I haven't completely excluded the possibility of doing them eventually. I've also looked at course tools such as Citrix Go To Training and a variety of other e-learning platforms, such as Moodle, which is the tool used by universities and schools around the world and which also seems to be the choice of Kilgray, ProZ and others for certain types of instruction.

Recorded video can be useful with many of these platforms, and since I've grown tired of doing the same demonstrations of software functions time and again, I've decided to record some of these for easy sharing and re-use. When I noticed recently that my Open Source screen recording software, CamStudio had been released in a new version, I decided quite spontaneously to make a quick video of pseudotranslation in memoQ to test whether a bug in the cursor display for the previous version of CamStudio had been fixed.

After I uploaded the pseudotranslation demo to YouTube, I noticed that rather appalling captions (subtitles) had been created by automatic voice recognition. Although voice recognition software such as Dragon Naturally Speaking is usually very kind to me, Google's voice recognition on YouTube gave miserable results.

I soon discovered, however, that the captions were easy to edit and could also be exported as text files with time cues. These text files can be edited very easily to correct recognition errors or combine segments to improve the timing and subtitle display.

Once the captions for the original language are cleaned up and the timing is improved, the text files can be translated and uploaded to the video in YouTube to create caption tracks in other languages. As a test, I did this (with a little help from my friends), adding tracks for German and European Portuguese to the pseudotranslation demo. And if anyone else cares to create another track for their native language from this file, I'll add it with credits at the start of the track.

It's easy enough to understand why I might want to add captions in other languages to a video I record in English or German. But why would I want to do so in the original language? My thick American accent is one reason. I like to imagine that my English is clear enough for everyone to understand, but that is a foolish conceit. Of course I speak clearly - I couldn't use Dragon successfully if that were not true. But someone with a knowledge of English mostly based on reading or interacting with people who have very different accents might have trouble. It happens.

Although most of the demonstration videos SDL has online for SDL Trados Studio are easy to follow, some of the thick UK accents are really frightening and difficult for some people in places like Flyover America to follow. Some Kilgray videos of excellent content are challenging for those unaccustomed to the accents, and the many wonderful demos of memoQ, WordFast, OmegaT and other tools by CAT Guru on YouTube would have been difficult for me before I was exposed to the linguistic challenges of the wide world that can English. All of these excellent resources in English would benefit from clear English subtitles.

How difficult is it to create captions? The three-minute pseudotranslation demo cost me about ten minutes of work to clean up the subtitles. The English captions for another slightly shorter video explaining the use of the FeeWizard Online to estimate equivalent rates for charging by source or target words, lines, pages, etc. also took me about 10 or 15 minutes with all the text and timing corrections. And I've spent a good bit of time in the past week transcribing a difficult spoken English lecture by a German professor: it took me about 7 hours of transcription work to cope with a spoken hour. I don't know if this is typical, because I almost never do this sort of thing, and there were a lot of WTF moments. But I suppose three to seven times the recording length might be a reasonable range for estimating the effort of a draft edit and some timing changes. Not bad, really.

So if you are involved in creating instructional videos to put on YouTube or use elsewhere, please consider this easy way of making good work even better by investing a little time in caption creation and editing. Once you have done this for the original language, it will also be a simple matter to translate those captions to make your content even more accessible.


  1. Just came across this, Kevin - sorry for the delay. What we did in one of our courses for which we produced captioned videos was the following: we used Dragon Naturally Speaking to re-speak the video. The resulting transcript we saved as plain text. We uploaded it to YouTube as a caption file and the magic bit came when the YouTube/Google automatic voice recognition was crappy otherwise, but good enough to produce accurate timecodes automatically from our plain text caption file. This blogpost details the process http://elearningbakery.com/video-scribing-animating-on-a-shoestring/ although it does not talk about the latest videos we did. One other tip: when creating the transcript, insert full stops at the end of all sentences (don't leave things unpunctuated like normal subtitles) because if you don't, the YouTube matching algorithm will not work as well. I hope this helps :)

  2. Interesting. I've been working the opposite direction, taking the crappy conversions from YouTube and fixing them. I'll have to try uploading a plain text caption file and see how the synchronization works. Thank you for the tip. I have quite a backlog of videos to catch up on with subtitles, so I'll have ample opportunity to test this.

  3. Nowadays, you can also outsource the task of captioning videos to a captioning service, which is actually very affordable. Services like DirectCaption.com only charge $1 per minute and they're very reliable too. You can even just copy and paste the YouTube link to have it captioned.

    1. Chase, I've done enough captioning work to know the effort involved. At those rates, well... you pay peanuts, you get monkeys. And frankly, I prefer competent human beings to do my caption work for me :-) The others drive me bananas.

    2. Agreed that we can choose how we want to do captioning or who we want to hire to do them. Captioning services just offers an alternative for YouTubers, video creators and distributors, etc., who may not have the time to do it themselves, and it also serves an encouragement for people to actually have their videos captioned, rather leaving them without captions or with very poor ones. With the industry becoming more competitive, the quality will no doubt be on the rise as well. Thanks for your reply :)

    3. Get real, Chase. At the rates you quote, the most you would get from a monkey would be the shit he'll throw at you from his cage in the zoo. You say the service you cited "serves [as] an encouragement for people to actually have their videos captioned, rather leaving them without captions or with very poor ones". You stand corrected. You will in fact get very poor work and possibly ruin your reputation unintentionally. It is better to leave content untranslated than to dip your head in the in the barrel of merde in the linguistic domain of the apes. There will, of course, be other perspectives on this from many Linguistic Sausage Providers and the usual snake oil hawking suspects among the thought bleeders at the Common Nonsense Advisory, TAUS, et cetera to whom we should all be eternally grateful for proving so clearly even to the non-technical folk among us that the old plumbing adage about shit running downhill is true and beyond doubt. Unless, of course, you add pressure from Lionbridge, TransPerfect, thebigword and other such sausage factories to the lines, in which case one can move the shit to the highest levels by hydraulic action, though it is unlikely that the "resources" used by these solution providers have the skills to handle the unit conversions accurately to tell the readers in other languages exactly how high :-)


Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)