Predicting human gaze beyond pixels

Juan Xu; Ming Jiang; Shuo Wang; Mohan S. Kankanhalli; Qi Zhao

doi:10.1167/14.1.28

Predicting human gaze beyond pixels

Juan Xu, Ming Jiang, Shuo Wang, Mohan S. Kankanhalli, Qi Zhao

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

242 Scopus citations

Abstract

A large body of previous models to predict where people look in natural scenes focused on pixel-level image attributes. To bridge the semantic gap between the predictive power of computational saliency models and human behavior, we propose a new saliency architecture that incorporates information at three layers: pixel-level image attributes, object-level attributes, and semanticlevel attributes. Object- and semantic-level information is frequently ignored, or only a few sample object categories are discussed where scaling to a large number of object categories is not feasible nor neurally plausible. To address this problem, this work constructs a principled vocabulary of basic attributes to describe object- and semantic-level information thus not restricting to a limited number of object categories. We build a new dataset of 700 images with eye-tracking data of 15 viewers and annotation data of 5,551 segmented objects with fine contours and 12 semantic attributes (publicly available with the paper). Experimental results demonstrate the importance of the object- and semantic-level information in the prediction of visual attention.

Original language	English (US)
Article number	28
Journal	Journal of vision
Volume	14
Issue number	1
DOIs	https://doi.org/10.1167/14.1.28
State	Published - 2014

Keywords

Computational model
Dataset
Object saliency
Saliency attribute
Semantic saliency
Visual saliency

Access

10.1167/14.1.28

OpenUrl availability

Full text

Cite this

@article{de3bb758ab174466bfba1027f08dfc90,

title = "Predicting human gaze beyond pixels",

abstract = "A large body of previous models to predict where people look in natural scenes focused on pixel-level image attributes. To bridge the semantic gap between the predictive power of computational saliency models and human behavior, we propose a new saliency architecture that incorporates information at three layers: pixel-level image attributes, object-level attributes, and semanticlevel attributes. Object- and semantic-level information is frequently ignored, or only a few sample object categories are discussed where scaling to a large number of object categories is not feasible nor neurally plausible. To address this problem, this work constructs a principled vocabulary of basic attributes to describe object- and semantic-level information thus not restricting to a limited number of object categories. We build a new dataset of 700 images with eye-tracking data of 15 viewers and annotation data of 5,551 segmented objects with fine contours and 12 semantic attributes (publicly available with the paper). Experimental results demonstrate the importance of the object- and semantic-level information in the prediction of visual attention.",

keywords = "Computational model, Dataset, Object saliency, Saliency attribute, Semantic saliency, Visual saliency",

author = "Juan Xu and Ming Jiang and Shuo Wang and Kankanhalli, {Mohan S.} and Qi Zhao",

year = "2014",

doi = "10.1167/14.1.28",

language = "English (US)",

volume = "14",

journal = "Journal of vision",

issn = "1534-7362",

publisher = "Association for Research in Vision and Ophthalmology Inc.",

number = "1",

}

TY - JOUR

T1 - Predicting human gaze beyond pixels

AU - Xu, Juan

AU - Jiang, Ming

AU - Wang, Shuo

AU - Kankanhalli, Mohan S.

AU - Zhao, Qi

PY - 2014

Y1 - 2014

N2 - A large body of previous models to predict where people look in natural scenes focused on pixel-level image attributes. To bridge the semantic gap between the predictive power of computational saliency models and human behavior, we propose a new saliency architecture that incorporates information at three layers: pixel-level image attributes, object-level attributes, and semanticlevel attributes. Object- and semantic-level information is frequently ignored, or only a few sample object categories are discussed where scaling to a large number of object categories is not feasible nor neurally plausible. To address this problem, this work constructs a principled vocabulary of basic attributes to describe object- and semantic-level information thus not restricting to a limited number of object categories. We build a new dataset of 700 images with eye-tracking data of 15 viewers and annotation data of 5,551 segmented objects with fine contours and 12 semantic attributes (publicly available with the paper). Experimental results demonstrate the importance of the object- and semantic-level information in the prediction of visual attention.

AB - A large body of previous models to predict where people look in natural scenes focused on pixel-level image attributes. To bridge the semantic gap between the predictive power of computational saliency models and human behavior, we propose a new saliency architecture that incorporates information at three layers: pixel-level image attributes, object-level attributes, and semanticlevel attributes. Object- and semantic-level information is frequently ignored, or only a few sample object categories are discussed where scaling to a large number of object categories is not feasible nor neurally plausible. To address this problem, this work constructs a principled vocabulary of basic attributes to describe object- and semantic-level information thus not restricting to a limited number of object categories. We build a new dataset of 700 images with eye-tracking data of 15 viewers and annotation data of 5,551 segmented objects with fine contours and 12 semantic attributes (publicly available with the paper). Experimental results demonstrate the importance of the object- and semantic-level information in the prediction of visual attention.

KW - Computational model

KW - Dataset

KW - Object saliency

KW - Saliency attribute

KW - Semantic saliency

KW - Visual saliency

UR - http://www.scopus.com/inward/record.url?scp=84893634313&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893634313&partnerID=8YFLogxK

U2 - 10.1167/14.1.28

DO - 10.1167/14.1.28

M3 - Article

C2 - 24474825

AN - SCOPUS:84893634313

SN - 1534-7362

VL - 14

JO - Journal of vision

JF - Journal of vision

IS - 1

M1 - 28

ER -

Predicting human gaze beyond pixels

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this