Weakly Supervised Learning

The standard approach to object detection is like the "Where is Wally?" puzzle in Fig. 1; given images of Wally, we can find Wally in any new image.


Figure 1: "Where's Wally?" puzzle illustrates the standard approach to object detection. Given images of Wally, find Wally in the given scene.

In computer vision terms this is called a full-supervised learning approach to training object detectors. However, the human visual system is much more sophisticated. Consider the "Who is Molly?" puzzle in Fig. 2. Here we are given a set of images with Molly and a set of images without Molly and asked to figure out what Molly looks like and locate her in all the images.


Figure 2: "Who is Molly?" puzzle goes beyond just object detection and into simultaneous modelling and localisation. Given images with and without Molly, find what Molly looks like and find Molly in the images in which she appears. (Solution)

In computer vision terms this is a weakly-supervised approach to training object detectors because we are not given examples of Molly to learn a detector from; we are only given weak labels that indicate which images she appears in. In computer vision the fully-supervised problem has been studied for some time, however the weakly-supervised problem is still in its infancy. The focus of my PhD thesis was on the weakly-supervised approaches to training objects and actions detectors.


[1] P. Siva "Automatic Annotation for Weakly Supervised Learning of Detectors", PhD Thesis, Queen Mary University of London, 2012.

[2] P. Siva, C. Russell, and T. Xiang "In Defence of Negative Mining for Annotating Weakly Labelled Data", 12th European Conference on Computer Vision, Florence, Oct. 7-13 2012.

[3] P. Siva and T. Xiang "Weakly Supervised Object Detector Learning with Model Drift Detection", 13th IEEE International Conference on Computer Vision, Barcelona, Nov. 6-13 2011.

[4] P. Siva and T. Xiang "Weakly Supervised Action Detection", Proceedings of the British Machine Vision Conference, Dundee, Aug. 29-Sept. 2 2011.

[5] P. Siva and T. Xiang "Action Detection in Crowd", Proceedings of the British Machine Vision Conference, Aberystwyth, Aug. 31-Sept. 3 2010, pp.9.1-9.11. (Best Poster Prize)