TY - GEN
T1 - Predicting behaviors of basketball players from first person videos
AU - Su, Shan
AU - Hong, Jung Pyo
AU - Shi, Jianbo
AU - Park, Hyun Soo
PY - 2017/11/6
Y1 - 2017/11/6
N2 - This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person cameras to automatically annotate each other's visual semantics of social configurations. We leverage two learning signals uniquely embedded in first person videos. Individually, a first person video records the visual semantics of a spatial and social layout around a person that allows associating with past similar situations. Collectively, first person videos follow joint attention that can link the individuals to a group. We learn the egocentric visual semantics of group movements using a Siamese neural network to retrieve future trajectories. We consolidate the retrieved trajectories from all players by maximizing a measure of social compatibility-the gaze alignment towards joint attention predicted by their social formation, where the dynamics of joint attention is learned by a longterm recurrent convolutional network. This allows us to characterize which social configuration is more plausible and predict future group trajectories.
AB - This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person cameras to automatically annotate each other's visual semantics of social configurations. We leverage two learning signals uniquely embedded in first person videos. Individually, a first person video records the visual semantics of a spatial and social layout around a person that allows associating with past similar situations. Collectively, first person videos follow joint attention that can link the individuals to a group. We learn the egocentric visual semantics of group movements using a Siamese neural network to retrieve future trajectories. We consolidate the retrieved trajectories from all players by maximizing a measure of social compatibility-the gaze alignment towards joint attention predicted by their social formation, where the dynamics of joint attention is learned by a longterm recurrent convolutional network. This allows us to characterize which social configuration is more plausible and predict future group trajectories.
UR - http://www.scopus.com/inward/record.url?scp=85041901026&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041901026&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2017.133
DO - 10.1109/CVPR.2017.133
M3 - Conference contribution
AN - SCOPUS:85041901026
T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
SP - 1206
EP - 1215
BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Y2 - 21 July 2017 through 26 July 2017
ER -