Results in the literature of classification models from microarray data often appear to be exceedingly good relative to most other domains of machine learning and clinical diagnostics. Yet array data are noisy, and have very small sample-to-variable ratios. What is the explanation for such exemplary, yet counterintuitive, classification performance? Answering this question has significant implications (a) for the broad acceptance of such models by the medical and biostatistical community, and (b) for gaining valuable insight on the properties of this domain. To address this problem we build several models for three classification tasks in a gene expression array dataset with 12,600 oligonucleotides and 203 patient cases. We then study the effects of: classifier type (kernel-based/non-kernel-based, linear/non-linear), sample size, sample selection within cross-validation, and gene information redundancy. Our analyses show that gene redundancy and classifier choice have the strongest effects on performance. Linear bias in the classifiers, and sample size (as long as kernel classifiers are used) have relatively small effects; train-test sample ratio, and the choice of cross-validation sample selection method appear to have small-to-negligible effects.