This paper presents a semi-supervised learning framework to train a keypoint detector using multiview image streams given the limited number of labeled instances (typically <4%). We leverage three self-supervisionary signals in multiview tracking to utilize the unlabeled data: (1) a keypoint in one view can be supervised by other views via epipolar geometry; (2) a keypoint detection must be consistent across time; (3) a visible keypoint in one view is likely to be visible in the adjacent view. We design a new end-to-end network that can propagate these self-supervisionary signals across the unlabeled data from the labeled data in a differentiable manner. We show that our approach outperforms existing detectors including DeepLabCut tailored to the keypoint detection of non-human species such as monkeys, dogs, and mice.
|Original language||English (US)|
|Title of host publication||Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||9|
|State||Published - Mar 2020|
|Event||2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020 - Snowmass Village, United States|
Duration: Mar 1 2020 → Mar 5 2020
|Name||Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020|
|Conference||2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020|
|Period||3/1/20 → 3/5/20|
Bibliographical noteFunding Information:
8. Acknowledgements This work is supported by NSF IIS 1846031 and NSF IIS 1755895.