Visual-verbal consistency of image saliency

Haoran Liang, Ming Jiang, Ronghua Liang, Qi Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

When looking at an image, humans shift their attention towards interesting regions, making sequences of eye fixations. When describing an image, they also come up with simple sentences that highlight the key elements in the scene. What is the correlation between where people look and what they describe in an image? To investigate this problem, we look into eye fixations and image captions, two types of subjective annotations that are relatively task-free and natural. From the annotations, we extract visual and verbal saliency ranks to compare against each other. We then propose a number of low-level and semantic-level features relevant to the visual-verbal consistency. Integrated into a computational model, the proposed features effectively predict the consistency between the two modalities on a large dataset with both types of annotations, namely SALICON [1].

Original languageEnglish (US)
Title of host publication2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3489-3494
Number of pages6
ISBN (Electronic)9781538616451
DOIs
StatePublished - Nov 27 2017
Event2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017 - Banff, Canada
Duration: Oct 5 2017Oct 8 2017

Publication series

Name2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017
Volume2017-January

Other

Other2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017
CountryCanada
CityBanff
Period10/5/1710/8/17

Keywords

  • Correlation
  • Image caption
  • Visual saliency
  • Visual-verbal consistency

Fingerprint Dive into the research topics of 'Visual-verbal consistency of image saliency'. Together they form a unique fingerprint.

Cite this