I want a system that I can run a video or audio file through that will take all the words being spoken aloud in the file, transcribe them, and put them back into the file so that the text is aligned with the words being spoken at that moment in the video.
Then, I could easily search that file for certain keywords and content without having to manually assign those keywords to the video. If the video or audio is an hour long presentation, but I’m only interested in three minutes of it, this kind of system would make finding those three minutes a lot easier.
Of course, as soon as such a system is brought onto the web and tied into YouTube, Google Video, and other such online multimedia systems, it gets far more interesting.
I’m sure the government already has such systems in place (like ECHELON), but I’m talking about a consumer-friendly system here.