In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes. There is a need to examine these data with statistical techniques to help discern possible patterns in the data. Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects. Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis. We develop a robust analysis method to address these problems. The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws. Our method requires a single pass and does not resort to complex "cleaning" or imputation of the data table before analysis. We illustrate the method with a commercial data set.
|Original language||English (US)|
|Number of pages||6|
|Journal||Proceedings of the National Academy of Sciences of the United States of America|
|State||Published - Nov 11 2003|