Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks

Mengmi Zhang; Keng Teck Ma; Joo Hwee Lim; Qi Zhao; Jiashi Feng

doi:10.1109/CVPR.2017.377

Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks

Mengmi Zhang, Keng Teck Ma, Joo Hwee Lim, Qi Zhao, Jiashi Feng

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

76 Scopus citations

Abstract

We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all wellestablished baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.

Original language	English (US)
Title of host publication	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3539-3548
Number of pages	10
ISBN (Electronic)	9781538604571
DOIs	https://doi.org/10.1109/CVPR.2017.377
State	Published - Nov 6 2017
Event	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States Duration: Jul 21 2017 → Jul 26 2017

Publication series

Name	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Volume	2017-January

Other

Other	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Country/Territory	United States
City	Honolulu
Period	7/21/17 → 7/26/17

Bibliographical note

Funding Information:
This work was supported by the Reverse Engineering Visual Intelligence for cognitive Enhancement (REVIVE) programme funded by the Joint Council Office of A∗STAR, National University of Singapore startup grant R-263-000-C08-133 and Ministry of Education of Singapore AcRF Tier One grant R-263-000-C21-112. We also like to thank Yin Li for his help in replicating the experimental setup in [25].

Publisher Copyright:
© 2017 IEEE.

Access

10.1109/CVPR.2017.377

OpenUrl availability

Full text

Cite this

Zhang, M., Ma, K. T., Lim, J. H., Zhao, Q., & Feng, J. (2017). Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (pp. 3539-3548). (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR.2017.377

Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks. / Zhang, Mengmi; Ma, Keng Teck; Lim, Joo Hwee et al.
Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 3539-3548 (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zhang, M, Ma, KT, Lim, JH, Zhao, Q & Feng, J 2017, Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks. in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 3539-3548, 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, United States, 7/21/17. https://doi.org/10.1109/CVPR.2017.377

Zhang M, Ma KT, Lim JH, Zhao Q, Feng J. Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 3539-3548. (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017). doi: 10.1109/CVPR.2017.377

Zhang, Mengmi ; Ma, Keng Teck ; Lim, Joo Hwee et al. / Deep future gaze : Gaze anticipation on egocentric videos using adversarial networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 3539-3548 (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017).

@inproceedings{a74a6463777049f6a960b1ebd0e7366f,

title = "Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks",

abstract = "We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all wellestablished baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.",

author = "Mengmi Zhang and Ma, {Keng Teck} and Lim, {Joo Hwee} and Qi Zhao and Jiashi Feng",

note = "Funding Information: This work was supported by the Reverse Engineering Visual Intelligence for cognitive Enhancement (REVIVE) programme funded by the Joint Council Office of A∗STAR, National University of Singapore startup grant R-263-000-C08-133 and Ministry of Education of Singapore AcRF Tier One grant R-263-000-C21-112. We also like to thank Yin Li for his help in replicating the experimental setup in [25]. Publisher Copyright: {\textcopyright} 2017 IEEE.; 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 ; Conference date: 21-07-2017 Through 26-07-2017",

year = "2017",

month = nov,

day = "6",

doi = "10.1109/CVPR.2017.377",

language = "English (US)",

series = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3539--3548",

booktitle = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",

}

TY - GEN

T1 - Deep future gaze

T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

AU - Zhang, Mengmi

AU - Ma, Keng Teck

AU - Lim, Joo Hwee

AU - Zhao, Qi

AU - Feng, Jiashi

N1 - Funding Information: This work was supported by the Reverse Engineering Visual Intelligence for cognitive Enhancement (REVIVE) programme funded by the Joint Council Office of A∗STAR, National University of Singapore startup grant R-263-000-C08-133 and Ministry of Education of Singapore AcRF Tier One grant R-263-000-C21-112. We also like to thank Yin Li for his help in replicating the experimental setup in [25]. Publisher Copyright: © 2017 IEEE.

PY - 2017/11/6

Y1 - 2017/11/6

N2 - We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all wellestablished baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.

AB - We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all wellestablished baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.

UR - http://www.scopus.com/inward/record.url?scp=85040649121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040649121&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2017.377

DO - 10.1109/CVPR.2017.377

M3 - Conference contribution

AN - SCOPUS:85040649121

T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

SP - 3539

EP - 3548

BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 21 July 2017 through 26 July 2017

ER -

Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks

Abstract

Publication series

Other

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this