Visual attention in multi-label image classification

Yan Luo; Ming Jiang; Qi Zhao

doi:10.1109/CVPRW.2019.00110

Visual attention in multi-label image classification

Yan Luo, Ming Jiang, Qi Zhao

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

14 Scopus citations

Abstract

One of the most significant challenges in multi-label image classification is the learning of representative features that capture the rich semantic information in a cluttered scene. As an information bottleneck, the visual attention mechanism allows humans to selectively process the most important visual input, enabling rapid and accurate scene understanding. In this work, we study the correlation between visual attention and multi-label image classification, and exploit an extra attention pathway for improving multi-label image classification performance. Specifically, we propose a dual-stream neural network that consists of two sub-networks: one is a conventional classification model and the other is a saliency prediction model trained with human fixations. Features computed with the two sub-networks are trained separately and then fine-tuned jointly using a multiple cross entropy loss. Experimental results show that the additional saliency sub-network improves multi-label image classification performance on the MS COCO dataset. The improvement is consistent across various levels of scene clutterness.

Original language	English (US)
Title of host publication	Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
Publisher	IEEE Computer Society
Pages	820-827
Number of pages	8
ISBN (Electronic)	9781728125060
DOIs	https://doi.org/10.1109/CVPRW.2019.00110
State	Published - Jun 2019
Event	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 - Long Beach, United States Duration: Jun 16 2019 → Jun 20 2019

Publication series

Name	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume	2019-June
ISSN (Print)	2160-7508
ISSN (Electronic)	2160-7516

Conference

Conference	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
Country/Territory	United States
City	Long Beach
Period	6/16/19 → 6/20/19

Bibliographical note

Funding Information:
This research was funded by the NSF under Grants 1849107 and 1763761, and the University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ).

Publisher Copyright:
© 2019 IEEE.

Access

10.1109/CVPRW.2019.00110

OpenUrl availability

Full text

Cite this

Luo, Y., Jiang, M., & Zhao, Q. (2019). Visual attention in multi-label image classification. In Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 (pp. 820-827). Article 9025569 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2019-June). IEEE Computer Society. https://doi.org/10.1109/CVPRW.2019.00110

Visual attention in multi-label image classification. / Luo, Yan; Jiang, Ming ; Zhao, Qi.
Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019. IEEE Computer Society, 2019. p. 820-827 9025569 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Luo, Y, Jiang, M & Zhao, Q 2019, Visual attention in multi-label image classification. in Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019., 9025569, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2019-June, IEEE Computer Society, pp. 820-827, 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019, Long Beach, United States, 6/16/19. https://doi.org/10.1109/CVPRW.2019.00110

@inproceedings{fbcf87a3738f47ecbc91ebb4f7172ee4,

title = "Visual attention in multi-label image classification",

abstract = "One of the most significant challenges in multi-label image classification is the learning of representative features that capture the rich semantic information in a cluttered scene. As an information bottleneck, the visual attention mechanism allows humans to selectively process the most important visual input, enabling rapid and accurate scene understanding. In this work, we study the correlation between visual attention and multi-label image classification, and exploit an extra attention pathway for improving multi-label image classification performance. Specifically, we propose a dual-stream neural network that consists of two sub-networks: one is a conventional classification model and the other is a saliency prediction model trained with human fixations. Features computed with the two sub-networks are trained separately and then fine-tuned jointly using a multiple cross entropy loss. Experimental results show that the additional saliency sub-network improves multi-label image classification performance on the MS COCO dataset. The improvement is consistent across various levels of scene clutterness.",

author = "Yan Luo and Ming Jiang and Qi Zhao",

note = "Funding Information: This research was funded by the NSF under Grants 1849107 and 1763761, and the University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ). Publisher Copyright: {\textcopyright} 2019 IEEE.; 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 ; Conference date: 16-06-2019 Through 20-06-2019",

year = "2019",

month = jun,

doi = "10.1109/CVPRW.2019.00110",

language = "English (US)",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops",

publisher = "IEEE Computer Society",

pages = "820--827",

booktitle = "Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019",

}

TY - GEN

T1 - Visual attention in multi-label image classification

AU - Luo, Yan

AU - Jiang, Ming

AU - Zhao, Qi

N1 - Funding Information: This research was funded by the NSF under Grants 1849107 and 1763761, and the University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ). Publisher Copyright: © 2019 IEEE.

PY - 2019/6

Y1 - 2019/6

N2 - One of the most significant challenges in multi-label image classification is the learning of representative features that capture the rich semantic information in a cluttered scene. As an information bottleneck, the visual attention mechanism allows humans to selectively process the most important visual input, enabling rapid and accurate scene understanding. In this work, we study the correlation between visual attention and multi-label image classification, and exploit an extra attention pathway for improving multi-label image classification performance. Specifically, we propose a dual-stream neural network that consists of two sub-networks: one is a conventional classification model and the other is a saliency prediction model trained with human fixations. Features computed with the two sub-networks are trained separately and then fine-tuned jointly using a multiple cross entropy loss. Experimental results show that the additional saliency sub-network improves multi-label image classification performance on the MS COCO dataset. The improvement is consistent across various levels of scene clutterness.

AB - One of the most significant challenges in multi-label image classification is the learning of representative features that capture the rich semantic information in a cluttered scene. As an information bottleneck, the visual attention mechanism allows humans to selectively process the most important visual input, enabling rapid and accurate scene understanding. In this work, we study the correlation between visual attention and multi-label image classification, and exploit an extra attention pathway for improving multi-label image classification performance. Specifically, we propose a dual-stream neural network that consists of two sub-networks: one is a conventional classification model and the other is a saliency prediction model trained with human fixations. Features computed with the two sub-networks are trained separately and then fine-tuned jointly using a multiple cross entropy loss. Experimental results show that the additional saliency sub-network improves multi-label image classification performance on the MS COCO dataset. The improvement is consistent across various levels of scene clutterness.

UR - http://www.scopus.com/inward/record.url?scp=85083318791&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85083318791&partnerID=8YFLogxK

U2 - 10.1109/CVPRW.2019.00110

DO - 10.1109/CVPRW.2019.00110

M3 - Conference contribution

AN - SCOPUS:85083318791

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

SP - 820

EP - 827

BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019

PB - IEEE Computer Society

T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019

Y2 - 16 June 2019 through 20 June 2019

ER -

Visual attention in multi-label image classification

Abstract

Publication series

Conference

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this