SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks

Xun Huang; Chengyao Shen; Xavier Boix; Qi Zhao

doi:10.1109/ICCV.2015.38

SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks

Xun Huang, Chengyao Shen, Xavier Boix, Qi Zhao

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

475 Scopus citations

Abstract

Saliency in Context (SALICON) is an ongoing effort that aims at understanding and predicting visual attention. Conventional saliency models typically rely on low-level image statistics to predict human fixations. While these models perform significantly better than chance, there is still a large gap between model prediction and human behavior. This gap is largely due to the limited capability of models in predicting eye fixations with strong semantic content, the so-called semantic gap. This paper presents a focused study to narrow the semantic gap with an architecture based on Deep Neural Network (DNN). It leverages the representational power of high-level semantics encoded in DNNs pretrained for object recognition. Two key components are fine-tuning the DNNs fully convolutionally with an objective function based on the saliency evaluation metrics, and integrating information at different image scales. We compare our method with 14 saliency models on 6 public eye tracking benchmark datasets. Results demonstrate that our DNNs can automatically learn features particularly for saliency prediction that surpass by a big margin the state-of-the-art. In addition, our model ranks top to date under all seven metrics on the MIT300 challenge set.

Original language	English (US)
Title of host publication	2015 International Conference on Computer Vision, ICCV 2015
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	262-270
Number of pages	9
ISBN (Electronic)	9781467383912
DOIs	https://doi.org/10.1109/ICCV.2015.38
State	Published - Feb 17 2015
Event	15th IEEE International Conference on Computer Vision, ICCV 2015 - Santiago, Chile Duration: Dec 11 2015 → Dec 18 2015

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
Volume	2015 International Conference on Computer Vision, ICCV 2015
ISSN (Print)	1550-5499

Other

Other	15th IEEE International Conference on Computer Vision, ICCV 2015
Country/Territory	Chile
City	Santiago
Period	12/11/15 → 12/18/15

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Access

10.1109/ICCV.2015.38

OpenUrl availability

Full text

Cite this

Huang, X., Shen, C., Boix, X., & Zhao, Q. (2015). SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In 2015 International Conference on Computer Vision, ICCV 2015 (pp. 262-270). Article 7410395 (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2015 International Conference on Computer Vision, ICCV 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV.2015.38

SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks. / Huang, Xun; Shen, Chengyao; Boix, Xavier et al.
2015 International Conference on Computer Vision, ICCV 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 262-270 7410395 (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2015 International Conference on Computer Vision, ICCV 2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Huang, X, Shen, C, Boix, X & Zhao, Q 2015, SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks. in 2015 International Conference on Computer Vision, ICCV 2015., 7410395, Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, Institute of Electrical and Electronics Engineers Inc., pp. 262-270, 15th IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 12/11/15. https://doi.org/10.1109/ICCV.2015.38

@inproceedings{09b63df7218c42c1b80144ce4f35baab,

title = "SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks",

abstract = "Saliency in Context (SALICON) is an ongoing effort that aims at understanding and predicting visual attention. Conventional saliency models typically rely on low-level image statistics to predict human fixations. While these models perform significantly better than chance, there is still a large gap between model prediction and human behavior. This gap is largely due to the limited capability of models in predicting eye fixations with strong semantic content, the so-called semantic gap. This paper presents a focused study to narrow the semantic gap with an architecture based on Deep Neural Network (DNN). It leverages the representational power of high-level semantics encoded in DNNs pretrained for object recognition. Two key components are fine-tuning the DNNs fully convolutionally with an objective function based on the saliency evaluation metrics, and integrating information at different image scales. We compare our method with 14 saliency models on 6 public eye tracking benchmark datasets. Results demonstrate that our DNNs can automatically learn features particularly for saliency prediction that surpass by a big margin the state-of-the-art. In addition, our model ranks top to date under all seven metrics on the MIT300 challenge set.",

author = "Xun Huang and Chengyao Shen and Xavier Boix and Qi Zhao",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; 15th IEEE International Conference on Computer Vision, ICCV 2015 ; Conference date: 11-12-2015 Through 18-12-2015",

year = "2015",

month = feb,

day = "17",

doi = "10.1109/ICCV.2015.38",

language = "English (US)",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "262--270",

booktitle = "2015 International Conference on Computer Vision, ICCV 2015",

}

TY - GEN

T1 - SALICON

T2 - 15th IEEE International Conference on Computer Vision, ICCV 2015

AU - Huang, Xun

AU - Shen, Chengyao

AU - Boix, Xavier

AU - Zhao, Qi

PY - 2015/2/17

Y1 - 2015/2/17

N2 - Saliency in Context (SALICON) is an ongoing effort that aims at understanding and predicting visual attention. Conventional saliency models typically rely on low-level image statistics to predict human fixations. While these models perform significantly better than chance, there is still a large gap between model prediction and human behavior. This gap is largely due to the limited capability of models in predicting eye fixations with strong semantic content, the so-called semantic gap. This paper presents a focused study to narrow the semantic gap with an architecture based on Deep Neural Network (DNN). It leverages the representational power of high-level semantics encoded in DNNs pretrained for object recognition. Two key components are fine-tuning the DNNs fully convolutionally with an objective function based on the saliency evaluation metrics, and integrating information at different image scales. We compare our method with 14 saliency models on 6 public eye tracking benchmark datasets. Results demonstrate that our DNNs can automatically learn features particularly for saliency prediction that surpass by a big margin the state-of-the-art. In addition, our model ranks top to date under all seven metrics on the MIT300 challenge set.

AB - Saliency in Context (SALICON) is an ongoing effort that aims at understanding and predicting visual attention. Conventional saliency models typically rely on low-level image statistics to predict human fixations. While these models perform significantly better than chance, there is still a large gap between model prediction and human behavior. This gap is largely due to the limited capability of models in predicting eye fixations with strong semantic content, the so-called semantic gap. This paper presents a focused study to narrow the semantic gap with an architecture based on Deep Neural Network (DNN). It leverages the representational power of high-level semantics encoded in DNNs pretrained for object recognition. Two key components are fine-tuning the DNNs fully convolutionally with an objective function based on the saliency evaluation metrics, and integrating information at different image scales. We compare our method with 14 saliency models on 6 public eye tracking benchmark datasets. Results demonstrate that our DNNs can automatically learn features particularly for saliency prediction that surpass by a big margin the state-of-the-art. In addition, our model ranks top to date under all seven metrics on the MIT300 challenge set.

UR - http://www.scopus.com/inward/record.url?scp=84973923049&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973923049&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2015.38

DO - 10.1109/ICCV.2015.38

M3 - Conference contribution

AN - SCOPUS:84973923049

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 262

EP - 270

BT - 2015 International Conference on Computer Vision, ICCV 2015

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 11 December 2015 through 18 December 2015

ER -

SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks

Abstract

Publication series

Other

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this