The future of video search


One of the transformative technologies over the next 5-10 years will be improved video search. With video becoming the majority of digital content on the web, the ability to find what is relevant and useful is a vital task. Imagine being able to find, in a world dominated by video content (accelerated by eventually most mobile phones including video capabilities), the video segments most relevant to what you. In an interview on Beet.TV, Google’s Gabriel Stricker talks about Google’s ambition to search all video on the web, including the content on YouTube and the dozens of other video hosting sites. As he mentions, only a tiny fraction of existing video is on the web, so part of the task is helping video to migrate or be accessible on the web. On one level, this is about making it easier and more compelling for video creators – professional and amateur – to post their content on the web. Another innovation that will advance this is when all video cameras and video processing software come with one-step functionality to get content on the web.

One thing that Gabriel didn’t mention in the interview was the mechanisms that Google intends to use for video search. At the moment most video search uses only the title, any tags given by the author or others, and potentially words used in links to the video. To be truly useful, video search needs to index both the words and images in the video in a meaningful way. The first phase of this is now possible, with fairly good voice recognition technologies allowing traditional text search capabilities to be overlaid on the video search. Examples include Blinkx and Nexidia, which allow video search using its voice recognition and text indexing capabilities. One of the applications is to have contextual ads next to the video changing depending on what people are speaking about as the video proceeds. However the next phase, of recognizing and indexing the images in video, is largely beyond current technologies. Image recognition of even simply objects has proven to be one of the most difficult tasks in artificial intelligence. Massively greater computing power than we currently have available, along with far better evolutionary algorithms, will be necessary to be able to reasonably accurately identify what is relevant in video content.

The other key aspect to search is embedded video tagging. People don’t want to find the most relevant video which may take them 10 minutes or an hour to watch, they want to find the most relevant part of the video. This requires widely-adopted tagging standards (MPEG-7 is a front-runner), and that both existing and new videos get accurately tagged along their timelines, not just the video as a whole.

In short, video search is in its infancy. This has a long, long way to play out. One of the reasons this is so important is that it allows a ready mechanism to personalization of advertising, without needing to know people’s individual identities or profile (see our framework on advertising personalization in the Future of Media Report 2007). We can expect that people will have literally tens of thousands of video/ TV channels to watch, however this will only be possible if they can be monetized effectively and automatically. Google is announcing its ambitions in this space. Let the game unfold – there are massive rewards to reap.

1 reply
  1. Lottie B
    Lottie B says:

    Hi Ross, I’m from the team at LocateTV, a TV and film search site that has just launched in private beta and I found your article really interesting and relevant to our struggles too.
    Our search is specific to TV and film listings on TV, online and DVD (region specific so you egt the right results for UK and US) and obivously this means we have video content on our database too. We are trying to keep the site very clean and focused on its purpose – Google-esque you might say! – but the richness of the search is our ongoing concern and video is one of the trickiest. Our development head brewed the idea for Locate after seeing a motorbike games clip online and wanting to see the whole thing, so the site came out of this kind of issue.
    We are focusing on professional content rather than user-generated stuff but it would be interesting to hear, even for this kind of content, how important it is for people to be able to identify individual scenes or segements – or metadata from the show from indexing the words and images as you suggest.
    Could anyone chip in and add their own user frustrations and needs in this market?
    Cheers, Lottie

Comments are closed.