Modeling multi-way data with linearly dependent loadings

Rasmus Bro, Richard A. Harshman, Nicholas D. Sidiropoulos, Margaret E. Lundy

Research output: Contribution to journalArticlepeer-review

75 Scopus citations

Abstract

A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi-way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface-level 'unique orientation' that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re-expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation-emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are 'unique' but this is onlymeaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error.

Original languageEnglish (US)
Pages (from-to)324-340
Number of pages17
JournalJournal of Chemometrics
Volume23
Issue number7-8
DOIs
StatePublished - Jul 1 2009

Keywords

  • Constrained ALS estimation
  • Equality constraints
  • Linear dependence
  • Multi-mode factor analysis
  • PARAFAC
  • PARATUCK2
  • Uniqueness

Fingerprint Dive into the research topics of 'Modeling multi-way data with linearly dependent loadings'. Together they form a unique fingerprint.

Cite this