We introduce a new problem of gaze anticipation on egocentric videos. This substantially extends the conventional gaze prediction problem to future frames by no longer confining it on the current frame. To solve this problem, we propose a new generative adversarial neural network based model, Deep Future Gaze (DFG). DFG generates multiple future frames conditioned on the single current frame and anticipates corresponding future gazes in next few seconds. It consists of two networks: generator and discriminator. The generator uses a two-stream spatial temporal convolution architecture (3D-CNN) explicitly untangling the foreground and the background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by differentiating the synthetic frames of the generator from the real frames. Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better. Experimental results on the publicly available egocentric datasets show that DFG significantly outperforms all wellestablished baselines. Moreover, we demonstrate that DFG achieves better performance of gaze prediction on current frames than state-of-the-art methods. This is due to benefiting from learning motion discriminative representations in frame generation. We further contribute a new egocentric dataset (OST) in the object search task. DFG also achieves the best performance for this challenging dataset.
|Original language||English (US)|
|Title of host publication||Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||10|
|State||Published - Nov 6 2017|
|Event||30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States|
Duration: Jul 21 2017 → Jul 26 2017
|Name||Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017|
|Other||30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017|
|Period||7/21/17 → 7/26/17|
Bibliographical noteFunding Information:
This work was supported by the Reverse Engineering Visual Intelligence for cognitive Enhancement (REVIVE) programme funded by the Joint Council Office of A?STAR, National University of Singapore startup grant R-263-000-C08-133 and Ministry of Education of Singapore AcRF Tier One grant R-263-000-C21-112. We also like to thank Yin Li for his help in replicating the experimental setup in .
© 2017 IEEE.