The structure of a scene provides global contextual information in directing gaze and complements local object information in saliency prediction. In this study, we explore how visual attention can be affected by scene structures, namely openness, depth and perspective. We first build an eye tracking dataset with 2500 natural scene images and collect gaze data via both eye tracking and mouse tracking. We make observations on scene layout properties and propose a set of scene structural features relating to visual attention. The set of complementary features are then integrated for saliency prediction. Our features are independent of and can work together with many computational modules, and this work demonstrates the use of Multiple kernel learning (MKL) as an example to integrate the features at low- and high-levels. Experimental results demonstrate that our model outperforms existing methods and our scene structural features can improve the performance of other saliency models in outdoor scenes.
Bibliographical noteFunding Information:
This work is supported by the National Science Foundation of China under grant 61702457 and grant 61871350 , a University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ). Haoran Liang is currently a research assistant at Department of Information Engineering, Zhejiang University of Technology. He received Ph.D. degree in control science and engineering from Zhejiang University of Technology in Jan, 2017. His research interests include computer vision, biological inspired vision and deep learning. Ming Jiang received Ph.D. degree from National University of Singapore working with Dr. Zhao, and obtained B.Sc.and M.Eng. degrees in Computer Science from Zhejiang University, China. Currently he is a postdoc researcher at the University of Minnesota. His research aims to understand the neural mechanism of selective visual attention and build attentional systems to predict where humans look at in natural environment. Ronghua Liang received the B.Sc. degree from Hangdian University, Hangzhou, China, in 1996, and the Ph.D. degree in computer science from Zhejiang University, Hangzhou, China, in 2003. He worked as a Research Fellow with the University of Bedfordshire, Bedfordshire, U.K., from April 2004 to July 2005, and as a Visiting Scholar at the University of California, Davis, CA, USA, from March 2010 to March 2011. He is currently a Professor of computer science and the Executive Dean of College of computer science with Zhejiang University of Technology. His research interests include computer vision, information visualization, and medical visualization. Qi Zhao is an assistant professor in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. Her main research interests include computer vision, machine learning, cognitive neuroscience, and mental disorders. She received her Ph.D. in computer engineering from the University of California, Santa Cruz in 2009. She was a postdoctoral researcher in the Computation and Neural Systems, and Division of Biology at the California Institute of Technology from 2009 to 2011. Prior to joining the University of Minnesota, Qi was an assistant professor in the Department of Electrical and Computer Engineering and the Department of Ophthalmology at the National University of Singapore. She has published more than 40 journal and conference papers in top computer vision, machine learning, and cognitive neuroscience venues, and edited a book with Springer, titled Computational and Cognitive Neuroscience of Vision, that provides a systematic and comprehensive overview of vision from various perspectives, ranging from neuroscience to cognition, and from computational principles to engineering developments. She is a member of the IEEE.
- Eye-tracking dataset
- Scene structure
- Visual saliency