Integrating Multimodal Features for Automatic Image Annotation

Published:

Integrating Multimodal Features for Automatic Image Annotation by Zahid Younas (2015)

Social media systems (such as Flickr, Facebook etc.) have become popular among the web users. These systems allow users to create, organize, search and share the social media contents (like images). The contents are annotated, organized and searched with arbitrary keywords describing the media called tags (annotations). Usually users annotate images manually. Manual tagging is laborious and time consuming. To facilitate users in the tagging process, a tag recommendation system suggests tags to a user for annotating images. The suggestion usually covers the contents of an image. Some state-of-the-art work has been done using contents as well as the semantics (context) of the image. Contents can be extracted form the features (low-level or local) of an image, but the semantics are captured by the associated tags of the image. Useful tag recommendation for images using the contents and semantics of images is still a challenging task that needs to be explored. In this thesis we propose an MFI framework which integrates multiple feature spaces (multimodal) to give automatic tag suggestions. The framework uses two approaches for tag recommendation. The first approach use the local (SIFT) features of images and the associated tags as its feature spaces. The integration of these feature spaces gives us meaningful tags as the suggestions come up from both feature sets jointly. Thus, covers the contents and semantics of an image. In the second approach, we use CSD (color structure descriptor) and EHD (edge histogram descriptor) features to evaluate the effect of different features sets in tag suggestions. We evaluate MFI Framework using precision and error rate measures. We come up with the conclusion that, the framework recommends more meaningful tags when used with local features and tags features integrally. We evaluate our framework on a real Flickr dataset containing 7502 images with associated metadata. MFI framework (with SIFT and tags integration) shows an improvement over different approaches.