TY - JOUR
T1 - Modeling multi-way data with linearly dependent loadings
AU - Bro, Rasmus
AU - Harshman, Richard A.
AU - Sidiropoulos, Nicholas D.
AU - Lundy, Margaret E.
PY - 2009
Y1 - 2009
N2 - A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi-way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface-level 'unique orientation' that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re-expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation-emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are 'unique' but this is onlymeaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error.
AB - A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi-way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface-level 'unique orientation' that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re-expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation-emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are 'unique' but this is onlymeaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error.
KW - Constrained ALS estimation
KW - Equality constraints
KW - Linear dependence
KW - Multi-mode factor analysis
KW - PARAFAC
KW - PARATUCK2
KW - Uniqueness
UR - http://www.scopus.com/inward/record.url?scp=70349274902&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349274902&partnerID=8YFLogxK
U2 - 10.1002/cem.1206
DO - 10.1002/cem.1206
M3 - Article
AN - SCOPUS:70349274902
SN - 0886-9383
VL - 23
SP - 324
EP - 340
JO - Journal of Chemometrics
JF - Journal of Chemometrics
IS - 7-8
ER -