A structure-guided approach to the prediction of natural image saliency

Haoran Liang; Ming Jiang; Ronghua Liang; Qi Zhao

doi:10.1016/j.neucom.2019.09.085

A structure-guided approach to the prediction of natural image saliency

Haoran Liang, Ming Jiang, Ronghua Liang, Qi Zhao

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

The structure of a scene provides global contextual information in directing gaze and complements local object information in saliency prediction. In this study, we explore how visual attention can be affected by scene structures, namely openness, depth and perspective. We first build an eye tracking dataset with 2500 natural scene images and collect gaze data via both eye tracking and mouse tracking. We make observations on scene layout properties and propose a set of scene structural features relating to visual attention. The set of complementary features are then integrated for saliency prediction. Our features are independent of and can work together with many computational modules, and this work demonstrates the use of Multiple kernel learning (MKL) as an example to integrate the features at low- and high-levels. Experimental results demonstrate that our model outperforms existing methods and our scene structural features can improve the performance of other saliency models in outdoor scenes.

Original language	English (US)
Pages (from-to)	441-454
Number of pages	14
Journal	Neurocomputing
Volume	378
DOIs	https://doi.org/10.1016/j.neucom.2019.09.085
State	Published - Feb 22 2020

Bibliographical note

Funding Information:
This work is supported by the National Science Foundation of China under grant 61702457 and grant 61871350 , a University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ). Haoran Liang is currently a research assistant at Department of Information Engineering, Zhejiang University of Technology. He received Ph.D. degree in control science and engineering from Zhejiang University of Technology in Jan, 2017. His research interests include computer vision, biological inspired vision and deep learning. Ming Jiang received Ph.D. degree from National University of Singapore working with Dr. Zhao, and obtained B.Sc.and M.Eng. degrees in Computer Science from Zhejiang University, China. Currently he is a postdoc researcher at the University of Minnesota. His research aims to understand the neural mechanism of selective visual attention and build attentional systems to predict where humans look at in natural environment. Ronghua Liang received the B.Sc. degree from Hangdian University, Hangzhou, China, in 1996, and the Ph.D. degree in computer science from Zhejiang University, Hangzhou, China, in 2003. He worked as a Research Fellow with the University of Bedfordshire, Bedfordshire, U.K., from April 2004 to July 2005, and as a Visiting Scholar at the University of California, Davis, CA, USA, from March 2010 to March 2011. He is currently a Professor of computer science and the Executive Dean of College of computer science with Zhejiang University of Technology. His research interests include computer vision, information visualization, and medical visualization. Qi Zhao is an assistant professor in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. Her main research interests include computer vision, machine learning, cognitive neuroscience, and mental disorders. She received her Ph.D. in computer engineering from the University of California, Santa Cruz in 2009. She was a postdoctoral researcher in the Computation and Neural Systems, and Division of Biology at the California Institute of Technology from 2009 to 2011. Prior to joining the University of Minnesota, Qi was an assistant professor in the Department of Electrical and Computer Engineering and the Department of Ophthalmology at the National University of Singapore. She has published more than 40 journal and conference papers in top computer vision, machine learning, and cognitive neuroscience venues, and edited a book with Springer, titled Computational and Cognitive Neuroscience of Vision, that provides a systematic and comprehensive overview of vision from various perspectives, ranging from neuroscience to cognition, and from computational principles to engineering developments. She is a member of the IEEE.

Publisher Copyright:
© 2019

Keywords

Eye-tracking dataset
Scene structure
Visual saliency

Access

10.1016/j.neucom.2019.09.085

OpenUrl availability

Full text

Cite this

@article{f0587674a04743d59a2be4fae408eb6f,

title = "A structure-guided approach to the prediction of natural image saliency",

abstract = "The structure of a scene provides global contextual information in directing gaze and complements local object information in saliency prediction. In this study, we explore how visual attention can be affected by scene structures, namely openness, depth and perspective. We first build an eye tracking dataset with 2500 natural scene images and collect gaze data via both eye tracking and mouse tracking. We make observations on scene layout properties and propose a set of scene structural features relating to visual attention. The set of complementary features are then integrated for saliency prediction. Our features are independent of and can work together with many computational modules, and this work demonstrates the use of Multiple kernel learning (MKL) as an example to integrate the features at low- and high-levels. Experimental results demonstrate that our model outperforms existing methods and our scene structural features can improve the performance of other saliency models in outdoor scenes.",

keywords = "Eye-tracking dataset, Scene structure, Visual saliency",

author = "Haoran Liang and Ming Jiang and Ronghua Liang and Qi Zhao",

note = "Funding Information: This work is supported by the National Science Foundation of China under grant 61702457 and grant 61871350 , a University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ). Haoran Liang is currently a research assistant at Department of Information Engineering, Zhejiang University of Technology. He received Ph.D. degree in control science and engineering from Zhejiang University of Technology in Jan, 2017. His research interests include computer vision, biological inspired vision and deep learning. Ming Jiang received Ph.D. degree from National University of Singapore working with Dr. Zhao, and obtained B.Sc.and M.Eng. degrees in Computer Science from Zhejiang University, China. Currently he is a postdoc researcher at the University of Minnesota. His research aims to understand the neural mechanism of selective visual attention and build attentional systems to predict where humans look at in natural environment. Ronghua Liang received the B.Sc. degree from Hangdian University, Hangzhou, China, in 1996, and the Ph.D. degree in computer science from Zhejiang University, Hangzhou, China, in 2003. He worked as a Research Fellow with the University of Bedfordshire, Bedfordshire, U.K., from April 2004 to July 2005, and as a Visiting Scholar at the University of California, Davis, CA, USA, from March 2010 to March 2011. He is currently a Professor of computer science and the Executive Dean of College of computer science with Zhejiang University of Technology. His research interests include computer vision, information visualization, and medical visualization. Qi Zhao is an assistant professor in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. Her main research interests include computer vision, machine learning, cognitive neuroscience, and mental disorders. She received her Ph.D. in computer engineering from the University of California, Santa Cruz in 2009. She was a postdoctoral researcher in the Computation and Neural Systems, and Division of Biology at the California Institute of Technology from 2009 to 2011. Prior to joining the University of Minnesota, Qi was an assistant professor in the Department of Electrical and Computer Engineering and the Department of Ophthalmology at the National University of Singapore. She has published more than 40 journal and conference papers in top computer vision, machine learning, and cognitive neuroscience venues, and edited a book with Springer, titled Computational and Cognitive Neuroscience of Vision, that provides a systematic and comprehensive overview of vision from various perspectives, ranging from neuroscience to cognition, and from computational principles to engineering developments. She is a member of the IEEE. Publisher Copyright: {\textcopyright} 2019",

year = "2020",

month = feb,

day = "22",

doi = "10.1016/j.neucom.2019.09.085",

language = "English (US)",

volume = "378",

pages = "441--454",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier",

}

TY - JOUR

T1 - A structure-guided approach to the prediction of natural image saliency

AU - Liang, Haoran

AU - Jiang, Ming

AU - Liang, Ronghua

AU - Zhao, Qi

N1 - Funding Information: This work is supported by the National Science Foundation of China under grant 61702457 and grant 61871350 , a University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ). Haoran Liang is currently a research assistant at Department of Information Engineering, Zhejiang University of Technology. He received Ph.D. degree in control science and engineering from Zhejiang University of Technology in Jan, 2017. His research interests include computer vision, biological inspired vision and deep learning. Ming Jiang received Ph.D. degree from National University of Singapore working with Dr. Zhao, and obtained B.Sc.and M.Eng. degrees in Computer Science from Zhejiang University, China. Currently he is a postdoc researcher at the University of Minnesota. His research aims to understand the neural mechanism of selective visual attention and build attentional systems to predict where humans look at in natural environment. Ronghua Liang received the B.Sc. degree from Hangdian University, Hangzhou, China, in 1996, and the Ph.D. degree in computer science from Zhejiang University, Hangzhou, China, in 2003. He worked as a Research Fellow with the University of Bedfordshire, Bedfordshire, U.K., from April 2004 to July 2005, and as a Visiting Scholar at the University of California, Davis, CA, USA, from March 2010 to March 2011. He is currently a Professor of computer science and the Executive Dean of College of computer science with Zhejiang University of Technology. His research interests include computer vision, information visualization, and medical visualization. Qi Zhao is an assistant professor in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. Her main research interests include computer vision, machine learning, cognitive neuroscience, and mental disorders. She received her Ph.D. in computer engineering from the University of California, Santa Cruz in 2009. She was a postdoctoral researcher in the Computation and Neural Systems, and Division of Biology at the California Institute of Technology from 2009 to 2011. Prior to joining the University of Minnesota, Qi was an assistant professor in the Department of Electrical and Computer Engineering and the Department of Ophthalmology at the National University of Singapore. She has published more than 40 journal and conference papers in top computer vision, machine learning, and cognitive neuroscience venues, and edited a book with Springer, titled Computational and Cognitive Neuroscience of Vision, that provides a systematic and comprehensive overview of vision from various perspectives, ranging from neuroscience to cognition, and from computational principles to engineering developments. She is a member of the IEEE. Publisher Copyright: © 2019

PY - 2020/2/22

Y1 - 2020/2/22

N2 - The structure of a scene provides global contextual information in directing gaze and complements local object information in saliency prediction. In this study, we explore how visual attention can be affected by scene structures, namely openness, depth and perspective. We first build an eye tracking dataset with 2500 natural scene images and collect gaze data via both eye tracking and mouse tracking. We make observations on scene layout properties and propose a set of scene structural features relating to visual attention. The set of complementary features are then integrated for saliency prediction. Our features are independent of and can work together with many computational modules, and this work demonstrates the use of Multiple kernel learning (MKL) as an example to integrate the features at low- and high-levels. Experimental results demonstrate that our model outperforms existing methods and our scene structural features can improve the performance of other saliency models in outdoor scenes.

AB - The structure of a scene provides global contextual information in directing gaze and complements local object information in saliency prediction. In this study, we explore how visual attention can be affected by scene structures, namely openness, depth and perspective. We first build an eye tracking dataset with 2500 natural scene images and collect gaze data via both eye tracking and mouse tracking. We make observations on scene layout properties and propose a set of scene structural features relating to visual attention. The set of complementary features are then integrated for saliency prediction. Our features are independent of and can work together with many computational modules, and this work demonstrates the use of Multiple kernel learning (MKL) as an example to integrate the features at low- and high-levels. Experimental results demonstrate that our model outperforms existing methods and our scene structural features can improve the performance of other saliency models in outdoor scenes.

KW - Eye-tracking dataset

KW - Scene structure

KW - Visual saliency

UR - http://www.scopus.com/inward/record.url?scp=85075423631&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85075423631&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2019.09.085

DO - 10.1016/j.neucom.2019.09.085

M3 - Article

AN - SCOPUS:85075423631

SN - 0925-2312

VL - 378

SP - 441

EP - 454

JO - Neurocomputing

JF - Neurocomputing

ER -

A structure-guided approach to the prediction of natural image saliency

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this