Automatic Synchronous Video Transcription

I want a system that I can run a video or audio file through that will take all the words being spoken aloud in the file, transcribe them, and put them back into the file so that the text is aligned with the words being spoken at that moment in the video.

Then, I could easily search that file for certain keywords and content without having to manually assign those keywords to the video. If the video or audio is an hour long presentation, but I’m only interested in three minutes of it, this kind of system would make finding those three minutes a lot easier.

Of course, as soon as such a system is brought onto the web and tied into YouTube, Google Video, and other such online multimedia systems, it gets far more interesting.

I’m sure the government already has such systems in place (like ECHELON), but I’m talking about a consumer-friendly system here.


  1. Posted February 1, 2007 at 11:28 pm | Permalink

    I’m pretty sure Google’s thrown a few million at the idea! Hope it makes it to market soon. I call myself using pinger and leave myself voice notes when driving, and always think how great it would be have that transcribed.

  2. Posted February 5, 2007 at 6:00 am | Permalink

    How does a closed captioning system work? Is someone typing those words to the video or is a piece of tech transcribing the words to text on the fly? If the later is true then the output could be captured with a little electronics know-how.

  3. Posted February 5, 2007 at 6:08 am | Permalink

    Ok, so I decided to find my own answers on wikipedia and it turns out that most tv programs are encoded before hand by a person but live programs have a speech to text program of some sort. Basically this means that one could get the program and train it to recognize their voice or they could transcribe content that doesn’t contain their voice. Basically, as long as “line 21” was populated with data I think it could be captured and tagged to the video at specific time intervals.

  4. Posted February 6, 2007 at 7:40 pm | Permalink

    Yeah, just reading your description of the systems needs makes my head spin… just hire cheap labor to transcribe… that’s what most of the corporate world does!