Labeling to create image datasets

Nowadays there are lots of applications which recognize and tag faces in a photography, identify a person through scanner’s eyepiece or detect the objects that are in the scene for visual searching. These examples show that diverse types of data annotation are needed.  All of the object detection applications are built on the methodologies of machine learning, which makes use of large training datasets.

Current image datasets

There is now a rich range of image datasets covering several application topics. Datasets are organized by category: Pascal, one of the most common image benchmark dataset, has 20 object classes classified in people, animals, vehicles and indoor objects.

Datasets have different goals, some are object-centric, used for object instance detection and segmentation such as Caltech-256 (with 256 classes) or ImageNet, the most used. ImageNet contains tens of millions of examples organised according to the WordNet dictionary hierarchy of nouns.

On the other side, there are several scene datasets. MSR is commonly used for full scene segmentation and contains 23 object classes. LabelMe is a dynamic dataset increasing and changing every day because is built by many users who label the objects online. However, SUN is currently the largest scene dataset which contains 130,000 images classified in 908 scene categories with 250,000 segmented objects.

Collecting data: annotation tools

Labeled data is needed in almost all image detection applications, but the only way to get the labels (wich are in the datasets) is manually. Recently, several annotation tools and games have been created to encourage people to label images (and built image datasets).

In ESP-game, users have to tag what the pictures contains; the more objects they label, the more punctuation they get. Peekaboom improves data collected by ESP-game locating objects in the images: player1 reveals parts of the image and player2 have to guess the associated word.

is an online annotation tool from MIT-CSAIL to build image databases for computer vision research. Everybody can label objects in an image for free, but with Amazon Mechanical Turk each user can label one object in the scene for 1cent, wich is more incentive. This way, they have built a training dataset with more than 2,000 annotated images and 30,000 tagged objects.

Source: Course “Object Recognition and Scene Understanding”, Antonio Torralba (CSAIL-MIT), UPC, July 2011. 


Successful Computer Vision Applications

Computer vision consist on methods for acquiring, processing, analysing and understanding images coming from the real world. It is a mix of vision, robotics and artificial intelligence to recognize objects and understand scenes.

Nowadays there are lots of succesful computer vision applications. Some of them have become very popular, like Microsoft Kinect, but not all of them:

  • Microsoft Kinect is the revolutionary system to obtain human pose recognition for the Xbox 360. It uses real-time captures of depth and RGB data for RGB-D mapping (3D maps of the environment), robotics grasping, object recognition and human tracking.
  • Google Goggles is a mobile application (Android and iOS) used for searches based on pictures taken by handheld devices. Goggles uses image recognition technology to recognize objects and return relevant search results. It identifies products, famous landmarks, storefronts, artwork and popular images found online.
  • Iteris has developed innovative vehicle video detection technologies based on image video processing. It analyzes real-time video images to help reduce traffic congestion and enhance driver safety.
  • MobileEye Vision systems, which uses a smart video camera installed inside the vehicle. Real time algorithms interpret the scene (captured in real-time) to warn automobile drivers of danger, provide adaptive cruise control and give driver assistance.
  • Hawk-eye provides ball tracking technology in sports such as football, tennis, baseball, cricket… using multiple cameras and vision algorithms. Hawk-eye is the most popular system and it is used in official tennis competitions. A similar vision algorithm will be implemented to detect football phantom goals in the World Cup.
  • L1 Identity Solutions for security and identification. They provide computer vision solutions for fingerprint/palm, iris, face recognition and multi-biometric.

[ You can visit the David Lowe webpage to see a list of the companies that have developed computer vision products. ]