Derived Metadata

Derived metadata comes from the Source (generally Technical)  and Added metadata and is Content metadata that we can obtain without additional human work.

Derived Metadata is calculated using a non-human external information source. It takes the Source metadata (preferably) or Added metadata, and uses software algorithms or web APIs (Application Programming Interface — a way of accessing a web application from another application) to take the basic facts and generate useful information.

At our current stage of technology we don’t have a lot of Derived Metadata but we have hints of what will come.


If we get GPS coordinates as part of our Source metadata, then we can derive location. Right now, there are few professional cameras that incorporate GPS, although the Panasonic HPX 3100 has a hardware option for GPS, and Panasonic’s metadata formats for P2 media and AVCCAM have metadata slots reserved for GPS.

An iPhone, iPad and most smartphones will deliver GPS data with the video file, which has been used iMovie on iPhone for titling in its themes. There are also a range of GPS equipped consumer cameras. It is something that’s coming to all video cameras.

SPEECH to Text/Transcription

Until recently most attempts at automatic speech-to-text have been disappointing. Adobe tried for a few years but their feature was withdrawn late in 2014.

Now there are several online API’s that any developer can access for a very small fee, that transcribe speech to text with a very high degree of accuracy. Microsoft claim that their Speech-to-Text engine is “as accurate as a human transcriber“. As well as Microsoft, IBM Watson and Google – among others – have highly accurate speech-to-text engines.

Human transcription would be classed as Added metadata because they are manually added by people.

More on Speech to Text here…

Keyword and Meaning Extraction

The developers of Speech-to-Text engines have additional developer accessible API’s for the automatic extraction of keywords and meaning or concepts, depending on the provider.

Lumberjack System already extract ‘Magic’ keywords (and ranges) from transcripts provided to it, via the Lumberyard desktop app.

From speech we can derive keywords and keyword ranges (or subclips) automating much of the organizational process ahead of editing.

More on Keyword and Meaning Extraction…

Facial Detection and Recognition

Facial detection finds the presence of one or more faces in an image. Facial Recognition names the faces it finds.

Some apps, including Final Cut Pro X, takes the Facial Detection information and infers from the size of the faces in the image, whether that is a Wide, Medium or Close shot. While this is useful information, more accurate Face Recognition will give us names of people in the shot: very valuable Content Metadata.

More on Facial Detection and Recognition…

Character, word and image Recognition

Technology already exists to read text from signs etc. within video images. Google’s WorldLens app reads in the text from a sign and then translates it and displays the translated version over the original. There are even free online services that will convert from your camera image.

Having optical character recognition derive signage information will help determine location more accurately, and may provide additional metadata.

All of the companies that provide Speech-to-Text and Keyword extraction technologies also have image recognition tools to detect and identify the content of an image.

More on Character, Word and Image Recognition…

EMOTION Detection

Not surprisingly, those same companies with other near artificial intelligence capabilities for speech and image, also have sentiment or emotion detection algorithms of varying type. Sentiment analysis simply determines positive|neutral|negative sentiment on a corpus of speech or text. Emotion analysis derives the actual emotion in the content.

IBM Watson’s emotion detection algorithm was used to extract ‘scary’ moments  to use as selects for the feature film trailer for Morgan.

There are other companies like Emotient, Affectiva and Intraface that are focused on fully recognizing human emotion. Apple purchase Emotient in January 2016, taking the technology in house.

More on Emotion Detection….

Return to Where do we get Content Metadata.