Transcription is the New Timecode

One of the most valuable forms of Content Metadata is transcription, but not just any old transcription: transcripts that are time-locked to the video (or audio) file, which is what the new crop of modern APIs provide.

This is the core of this new use of transcription: words in the transcript are locked to time stamps in the media file.

For a time, Adobe included Speech-to-Text functionality in Premiere Pro for navigation by text. You can still access this via encoding the speech into the media file metadata in an older version of Premiere Pro, but it has been removed some years ago. All embedded speech-to-text metadata is read by current releases of Premiere Pro. I believe the reason Adobe pulled the Autonomy technology they previously licensed was because it did not really work all that well.

With time-locked transcripts we can use the text to search for words and go directly to the time in the video.

Using phonetic search instead of actual transcripts, Boris Soundbite will also search for instances of a matching waveform, which is very useful for content not transcribed. Avid has now licensed the exclusive rights for media production via Avid Media Central. As well as some Nexidia products resold by Avid, the technology also powers Speech Find and ScriptSync.

There are highly functional technologies now, particularly from IBM, Microsoft Nuance and Google, for speech-to-text that are well suited to transcript enabled workflows and keyword/meaning/sentiment extraction.

For Final Cut Pro X Lumberjack System‘s Lumberyard app provides a way of merging time-stamped transcripts with Final Cut Pro X XML so the transcripts can be searched inside Final Cut Pro X on clips, multicam clips and synchronized clips. The upcoming SpeedScriber will round trip clips back to Final Cut Pro X with transcripts after correction in the SpeedScriber app.

Return to The Future of Content Metadata….