While – for the moment – we are limited to Content Metadata largely added by humans somewhere in the pre-edit process, there is a tectonic shift coming using computer assisted methods for deriving Content Metadata.
Before we can use Content Metadata for pre-editing we have had to add it manually, but a whole range of technologies have reached maturity and will start driving the acquisition of Content Metadata and some basic pre-editing over the next couple of years.
Technologies like Speech-to-Text Transcription; keyword and meaning extraction; facial detection and recognition; character, word and image recognition; and emotion detection will be the drivers of the organization and pre-editing processes.
Once we have every word indexed against the media file, we can search directly for the content of the spoken words in your project.
With all that in place, imagine your pre-edit preparation a few years from now:
After ingesting your footage, it analyzed for content. If there is mostly spoken word, then it is converted from speech to text and aligned with the media so you can search for words or phrases in the clips.
The key concepts are extracted and identified with keyword ranges or subclips, and string outs of core concepts extracted.
Ranges within clips with strong emotion are are identified.
Image content in B-roll is identified and tagged appropriately.
Basic subject-based edits are prepared for craft editors to complete.
Speech on takes is identified and aligned with a script. Organized by Scene.
Basic script-based string outs are created from dailies, by Scene, with alternative takes presented (like a Final Cut Pro X audition) with circle takes selected as the active clip.
While this is a little sci-fi at the beginning of 2017, most of the foundations are now in place and it only a matter of time before some smart software developer starts to make these enhanced workflows commonplace.
More on Speech-to-text…
More on Keyword and Meaning Extraction…
More on Emotion Detection…