In human object recognition, converging evidence has shown that subjects' performance depends on their familiarity with an object's appearance. The extent of such dependence is a function of the inter-object similarity. The more similar the objects are, the stronger this dependence will be and the more dominant the two-dimensional (2D) image-based information will be. However, the degree to which three-dimensional (3D) model-based information is used remains an area of strong debate. Previously the authors showed that all models with independent 2D templates that allowed 2D rotations in the image plane cannot account for human performance in discriminating novel object views. Here the authors derive an analytic formulation of a Bayesian model that gives rise to the best possible performance under 2D affine transformations and demonstrate that this model cannot account for human performance in 3D object discrimination. Relative to this model, human statistical efficiency is higher for novel views than for learned views, suggesting that human observers have used some 3D structural information.
Bibliographical noteFunding Information:
DK was supported by a grant from the National Science Foundation, contract number SBR-9631682. We thank Ronen Basri, David Jacobs, David Knill, Michael Langer, Pascal Mamassian, Bosco Tjan, Daphna Weinshall, the anonymous reviewers and in particular, John Oliensis, for many helpful discussions. Weinshall pointed out to us the Werman–Weinshall theorem. Part of this work was presented at the Hong Kong International Workshop on ‘Theoretical Aspects of Neural Computation,’ Hong Kong University of Science and Technology, 1997; European Conference on Visual Perception (ECVP), Helsinki, Finland, 1997; ‘Neural Information Processing’ (NIPS), Denver, Colorado, 1997; and ‘International Conference on Computer Vision’ (ICCV), Mumbai, India, 1998.
- Affine transformation
- Ideal observer
- Object recognition
- Object representation
- Template matching