Modeling multi-way data with linearly dependent loadings

Rasmus Bro; Richard A. Harshman; Nicholas D. Sidiropoulos; Margaret E. Lundy

doi:10.1002/cem.1206

Modeling multi-way data with linearly dependent loadings

Rasmus Bro, Richard A. Harshman, Nicholas D. Sidiropoulos, Margaret E. Lundy

Research output: Contribution to journal › Article › peer-review

95 Scopus citations

Abstract

A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi-way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface-level 'unique orientation' that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re-expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation-emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are 'unique' but this is onlymeaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error.

Original language	English (US)
Pages (from-to)	324-340
Number of pages	17
Journal	Journal of Chemometrics
Volume	23
Issue number	7-8
DOIs	https://doi.org/10.1002/cem.1206
State	Published - 2009
Externally published	Yes

Keywords

Constrained ALS estimation
Equality constraints
Linear dependence
Multi-mode factor analysis
PARAFAC
PARATUCK2
Uniqueness

Access

10.1002/cem.1206

OpenUrl availability

Full text

Cite this

@article{0b9ae9f29fb643839d1d5fe857557261,

title = "Modeling multi-way data with linearly dependent loadings",

abstract = "A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi-way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface-level 'unique orientation' that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re-expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation-emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are 'unique' but this is onlymeaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error.",

keywords = "Constrained ALS estimation, Equality constraints, Linear dependence, Multi-mode factor analysis, PARAFAC, PARATUCK2, Uniqueness",

author = "Rasmus Bro and Harshman, {Richard A.} and Sidiropoulos, {Nicholas D.} and Lundy, {Margaret E.}",

year = "2009",

doi = "10.1002/cem.1206",

language = "English (US)",

volume = "23",

pages = "324--340",

journal = "Journal of Chemometrics",

issn = "0886-9383",

publisher = "John Wiley and Sons Ltd",

number = "7-8",

}

TY - JOUR

T1 - Modeling multi-way data with linearly dependent loadings

AU - Bro, Rasmus

AU - Harshman, Richard A.

AU - Sidiropoulos, Nicholas D.

AU - Lundy, Margaret E.

PY - 2009

Y1 - 2009

N2 - A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi-way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface-level 'unique orientation' that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re-expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation-emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are 'unique' but this is onlymeaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error.

AB - A generalization/specialization of the PARAFAC model is developed that improves its properties when applied to multi-way problems involving linearly dependent factors. This model is called PARALIND (PARAllel profiles with LINear Dependences). Linear dependences can arise when the empirical sources of variation being modeled by factors are causally or logically linked during data generation, or circumstantially linked during data collection. For example, this can occur in a chemical context when end products are related to the precursor or in a psychological context when a single stimulus generates two incompatible feelings at once. For such cases, the most theoretically appropriate PARAFAC model has loading vectors that are linearly dependent in at least one mode, and when collinear, are nonunique in the others. However, standard PARAFAC analysis of fallible data will have neither of these features. Instead, latent linear dependences become high surface correlations and any latent nonuniqueness is replaced by a meaningless surface-level 'unique orientation' that optimally fits the particular random noise in that sample. To avoid these problems, any set of components that in theory should be rank deficient are re-expressed in PARALIND as a product of two matrices, one that explicitly represents their dependency relationships and another, with fewer columns, that captures their patterns of variation. To demonstrate the approach, we apply it first to fluorescence spectroscopy (excitation-emission matrices, EEM) data in which concentration values for two analytes covary exactly, and then to flow injection analysis (FIA) data in which subsets of columns are logically constrained to sum to a constant, but differently in each of two modes. In the PARAFAC solutions of the EEM data, all factors are 'unique' but this is onlymeaningful for two of the factors that are also unique at the latent level. In contrast, the PARALIND solutions directly display the extent and nature of partial nonuniqueness present at the latent level by exhibiting a corresponding partial uniqueness in their recovered loadings. For the FIA data, PARALIND constraints restore latent uniqueness to the concentration estimates. Comparison of the solutions shows that PARALIND more accurately recovers latent structure, presumably because it uses fewer parameters and hence fits less error.

KW - Constrained ALS estimation

KW - Equality constraints

KW - Linear dependence

KW - Multi-mode factor analysis

KW - PARAFAC

KW - PARATUCK2

KW - Uniqueness

UR - http://www.scopus.com/inward/record.url?scp=70349274902&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349274902&partnerID=8YFLogxK

U2 - 10.1002/cem.1206

DO - 10.1002/cem.1206

M3 - Article

AN - SCOPUS:70349274902

SN - 0886-9383

VL - 23

SP - 324

EP - 340

JO - Journal of Chemometrics

JF - Journal of Chemometrics

IS - 7-8

ER -

Modeling multi-way data with linearly dependent loadings

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this