Research

Published 19 January 2007

Focus on research: informatician Theo Gevers

Foto: Bob Bronshoff.

If you have ever searched the Internet trying to find that special romantic scene or the replay of that fantastic goal, you will be aware that retrieving video images is not an easy task. An European consortium under the leadership of the Intelligent Systems Laboratory Amsterdam (ISLA), where Theo Gevers works, was awarded 2.8 million euro by the European Union this summer in order to do something about this. During the next three years, and in collaboration with the best research groups in Europe, ISLA plans to develop a search engine that is able to find videos. With the right search command it will, in a few years, be possible to find that special scene amongst all the movies on the Internet.

The new programme will be a kind of video-Google: a search command will enable you to find the right image or video. To do this, the computer programme needs to be able to distinguish a vast number of concepts. A football scene is relatively easy to identify, but how can you explain something more abstract to a computer, for example ‘a world leader’, ‘two people shaking hands’ or, even more extreme, ‘love’ or ‘romance’? Gevers explains how this works. ‘We need to train the computer-software. If we want the programme to recognise an owl, we show it lots of pictures containing owls. The programme itself then looks for characteristics specific to owls.’

The programme will make use of various information sources coupled to an image or video-file on the Internet. ‘We don’t just look at the image itself, but also check whether the concept looked for can be found in text around it. Perhaps the video is undertitled, or there may be other text in the images. We also use e.g. speech recognition: if we know what’s being said in the movie, we can also search for this spoken text. The soundtrack also tells a lot about the videos. And of course we also look for visual information: whether something is moving, what colours are involved and what the transitions between colours are like.

Gevers says that Google is also working on a video retrieval system. ‘They only plan to search for text accompanying the video. However, since not all videos have an accurate description, Google is currently developing a game in which users are requested to describe the video shots. Google will then take an intersection of these descriptions, as there are bound to be similarities. If hundreds of thousands of people participate, it will be possible to obtain one single description for the video in which Google can look for relevant search commands.  Google’s method is entirely dependent on the descriptions made by the users themselves.’

Windmills

Gevers' method doesn’t depend on the verbal descriptive talents of users, but it does entail a vast amount of work. However, he doesn’t have to start from scratch. ‘We already have plenty of experience in object recognition; my group is very experienced at that. I was one of the first to develop a search engine for images (see link below), whereby I felt that colour could be very important.’ Previously, search engines made almost no use of colour, since colour is strongly dependent on the light source. An object seen in full sunlight shows a totally different colour than when seen in either fluorescent light or under a regular light bulb. Gevers has developed a programme that corrects for this difference. He has also tackled the problem of shape-invariance: the shape of a windmill varies greatly depending on the angle it is viewed from (see illustration below). Yet the computer must be able to identify a windmill in all these different shapes.

Images taken of a windmill that has been rotated 5 degrees per image. Since the windmill is pictured from different angles, it has a different shape each time. Yet the computer must be able to recognise the same windmill in all these different shapes. The images are property of the Amsterdam Library of Object Images [ALOI], Jan-Mark Geusebroek, http://staff.science.uva.nl/~aloi/

First of all, adjustments must be made for colour, and then the programme can look for important characteristics. ‘These consist of points where something interesting happens, e.g. a colour transition or a corner. It is here that information can be obtained, and so we place an ellipse around it (see illustration below), which also indicates the scale and orientation of the important information. Scale-independent characteristics are obtained by transforming the ellipse to a circle. We now have identically sized circles containing information. For example, one circle gives us the information that it contains some red and blue. We then count the incidence of circles with some red and blue. If we show the programme lots of images of a certain subject, it will learn which circles belong to that subject, and so it knows which points are important for that subject and can search for the correct points in all kinds of images.’

By transforming the ellipses to a circle, we obtain scale and rotation-independent characteristics. Diagram courtesy of Sietse Dijkstra

Showing the retrieved videos is also problematic: how can the user find the particular video shots he requires amongst a selection of retrieved videos shots? Gevers is also working on this problem. The set of frames used for the search also makes a big difference: it is, for example, easier to search in a set of news videos than in a set of home videos. ‘Compared to home videos, the news is very structured. It’s relatively easy to find a football match in the news: the green grass and the football are easy to identify. 91% of the images retrieved are indeed of football matches. The result is less successful with e.g. world leaders.’ There are, indeed, world leaders among the images Gevers shows us: more than half the images is correct. The other images also show many men in suits who are, however, not world leaders. The software has apparently used the characteristic ‘tie’ as a search criterion, although this is not only specific to world leaders, see figure. Gevers, apologetically: ‘But in this case only visual characteristics were taken into account.’

Foto: Bob Bronshoff.

Mona Lisa

Besides developing software to trace video images, Gevers is also working on various other projects. Together with Nicu Sebe, for example, he developed software for the recognition of facial expressions. ‘We would like to improve the interaction between people and machines. At present we use keyboards to communicate with the computer, but wouldn’t it be much more convenient just to be able to speak to your computer? In that case it would be nice if the computer was able to recognise your mood and react accordingly.’ Gevers and Sebe got the software to analyse Mona Lisa’s face and it turned out that she is 83% happy! The group attracted a lot of interest worldwide for its emotion recognition system. Meanwhile, his students are using the software to operate computer games. ‘It’s probably more fun to observe the players faces than to look at the games’, laughs Gevers.

Foto: Bob Bronshoff.

Society

There are various reasons that Gevers does research in this area. ‘A scientist should carry out research that suits him. It shouldn’t be too broad and, of course, a lot also depends on funding: I can only appoint a PhD student if I have money to do so, so there have to be funds available somewhere.’ The European partnership under the leadership of ISLA was awarded 2.8 million euro following a European Union call for video and audio research projects.  ‘You need to be able to take initiatives and to see what is needed, and then link that to what you would like to investigate yourself.’ Gevers’ research into the ultimate video-retrieval system appears to be for practical reasons. However, it also has a more idealistic background. ‘In the end, I believe that increasing the accessibility of  information contributes towards a better world. The more information people are able to access, the better our society can become.’

It will be some time before Gevers’ video-search system contributes to this better world. Gevers will be satisfied if the programme is able to recognise 1000 concepts three years from now. ‘When I want to find something on the Internet, I use Google’, he admits laughingly. ‘For the time being, anyway!’

Author(s)


Source: Afdeling Communicatie
|