Nowadays there are lots of applications which recognize and tag faces in a photography, identify a person through scanner’s eyepiece or detect the objects that are in the scene for visual searching. These examples show that diverse types of data annotation are needed. All of the object detection applications are built on the methodologies of machine learning, which makes use of large training datasets.
Current image datasets
There is now a rich range of image datasets covering several application topics. Datasets are organized by category: Pascal, one of the most common image benchmark dataset, has 20 object classes classified in people, animals, vehicles and indoor objects.
Datasets have different goals, some are object-centric, used for object instance detection and segmentation such as Caltech-256 (with 256 classes) or ImageNet, the most used. ImageNet contains tens of millions of examples organised according to the WordNet dictionary hierarchy of nouns.
On the other side, there are several scene datasets. MSR is commonly used for full scene segmentation and contains 23 object classes. LabelMe is a dynamic dataset increasing and changing every day because is built by many users who label the objects online. However, SUN is currently the largest scene dataset which contains 130,000 images classified in 908 scene categories with 250,000 segmented objects.
Collecting data: annotation tools
Labeled data is needed in almost all image detection applications, but the only way to get the labels (wich are in the datasets) is manually. Recently, several annotation tools and games have been created to encourage people to label images (and built image datasets).
In ESP-game, users have to tag what the pictures contains; the more objects they label, the more punctuation they get. Peekaboom improves data collected by ESP-game locating objects in the images: player1 reveals parts of the image and player2 have to guess the associated word.
LabelMe is an online annotation tool from MIT-CSAIL to build image databases for computer vision research. Everybody can label objects in an image for free, but with Amazon Mechanical Turk each user can label one object in the scene for 1cent, wich is more incentive. This way, they have built a training dataset with more than 2,000 annotated images and 30,000 tagged objects.