Iterative Supervised Principal Component Analysis Driven Ligand Design for Regioselective Ti-Catalyzed Pyrrole Synthesis

Xin Yi See; Xuelan Wen; T. Alexander Wheeler; Channing K. Klein; Jason D. Goodpaster; Benjamin R. Reiner; Ian A. Tonks

doi:10.1021/acscatal.0c03939

Iterative Supervised Principal Component Analysis Driven Ligand Design for Regioselective Ti-Catalyzed Pyrrole Synthesis

Xin Yi See, Xuelan Wen, T. Alexander Wheeler, Channing K. Klein, Jason D. Goodpaster, Benjamin R. Reiner, Ian A. Tonks

Chemistry (Twin Cities)

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

The rational design of catalysts remains a challenging endeavor within the broader chemical community owing to the myriad variables that can affect key bond-forming events. Designing selective catalysts for any reaction requires an efficient strategy for discovering predictive structure-activity relationships. Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H-pyrrole (C) via a Ti-catalyzed formal [2 + 2 +1] cycloaddition of phenylpropyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space and k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (>90% C) by incorporating 2,6-dimethyl-4-(pyrrolidin-1-yl)pyridine as a ligand. The origin of catalyst selectivity was probed by examining ISPCA variable loadings in combination with DFT modeling, revealing that ligand lability plays an important role in selectivity. A parallel catalyst search using multivariate linear regression (MLR), a popular approach in catalysis informatics, was also conducted in order to compare these strategies in a hypothetical catalyst scouting campaign. ISPCA appears to be more robust and predictive than MLR when sparse training sets are used that are representative of the data available during the early search for an optimal catalyst. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development.

Original language	English (US)
Pages (from-to)	13504-13517
Number of pages	14
Journal	ACS Catalysis
Volume	10
Issue number	22
DOIs	https://doi.org/10.1021/acscatal.0c03939
State	Published - Nov 20 2020

Bibliographical note

Publisher Copyright:
© 2020 American Chemical Society. All rights reserved.

Keywords

DFT
catalyst prediction
iterative supervised principal component analysis
pyrrole
selectivity
titanium

Access

10.1021/acscatal.0c03939

OpenUrl availability

Full text

Cite this

@article{d1b8fffa2f3f4fa1b874fa744884e608,

title = "Iterative Supervised Principal Component Analysis Driven Ligand Design for Regioselective Ti-Catalyzed Pyrrole Synthesis",

abstract = "The rational design of catalysts remains a challenging endeavor within the broader chemical community owing to the myriad variables that can affect key bond-forming events. Designing selective catalysts for any reaction requires an efficient strategy for discovering predictive structure-activity relationships. Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H-pyrrole (C) via a Ti-catalyzed formal [2 + 2 +1] cycloaddition of phenylpropyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space and k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (>90% C) by incorporating 2,6-dimethyl-4-(pyrrolidin-1-yl)pyridine as a ligand. The origin of catalyst selectivity was probed by examining ISPCA variable loadings in combination with DFT modeling, revealing that ligand lability plays an important role in selectivity. A parallel catalyst search using multivariate linear regression (MLR), a popular approach in catalysis informatics, was also conducted in order to compare these strategies in a hypothetical catalyst scouting campaign. ISPCA appears to be more robust and predictive than MLR when sparse training sets are used that are representative of the data available during the early search for an optimal catalyst. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development.",

keywords = "DFT, catalyst prediction, iterative supervised principal component analysis, pyrrole, selectivity, titanium",

author = "See, {Xin Yi} and Xuelan Wen and Wheeler, {T. Alexander} and Klein, {Channing K.} and Goodpaster, {Jason D.} and Reiner, {Benjamin R.} and Tonks, {Ian A.}",

year = "2020",

month = nov,

day = "20",

doi = "10.1021/acscatal.0c03939",

language = "English (US)",

volume = "10",

pages = "13504--13517",

journal = "ACS Catalysis",

issn = "2155-5435",

publisher = "American Chemical Society",

number = "22",

}

TY - JOUR

T1 - Iterative Supervised Principal Component Analysis Driven Ligand Design for Regioselective Ti-Catalyzed Pyrrole Synthesis

AU - See, Xin Yi

AU - Wen, Xuelan

AU - Wheeler, T. Alexander

AU - Klein, Channing K.

AU - Goodpaster, Jason D.

AU - Reiner, Benjamin R.

AU - Tonks, Ian A.

PY - 2020/11/20

Y1 - 2020/11/20

N2 - The rational design of catalysts remains a challenging endeavor within the broader chemical community owing to the myriad variables that can affect key bond-forming events. Designing selective catalysts for any reaction requires an efficient strategy for discovering predictive structure-activity relationships. Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H-pyrrole (C) via a Ti-catalyzed formal [2 + 2 +1] cycloaddition of phenylpropyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space and k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (>90% C) by incorporating 2,6-dimethyl-4-(pyrrolidin-1-yl)pyridine as a ligand. The origin of catalyst selectivity was probed by examining ISPCA variable loadings in combination with DFT modeling, revealing that ligand lability plays an important role in selectivity. A parallel catalyst search using multivariate linear regression (MLR), a popular approach in catalysis informatics, was also conducted in order to compare these strategies in a hypothetical catalyst scouting campaign. ISPCA appears to be more robust and predictive than MLR when sparse training sets are used that are representative of the data available during the early search for an optimal catalyst. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development.

AB - The rational design of catalysts remains a challenging endeavor within the broader chemical community owing to the myriad variables that can affect key bond-forming events. Designing selective catalysts for any reaction requires an efficient strategy for discovering predictive structure-activity relationships. Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H-pyrrole (C) via a Ti-catalyzed formal [2 + 2 +1] cycloaddition of phenylpropyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space and k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (>90% C) by incorporating 2,6-dimethyl-4-(pyrrolidin-1-yl)pyridine as a ligand. The origin of catalyst selectivity was probed by examining ISPCA variable loadings in combination with DFT modeling, revealing that ligand lability plays an important role in selectivity. A parallel catalyst search using multivariate linear regression (MLR), a popular approach in catalysis informatics, was also conducted in order to compare these strategies in a hypothetical catalyst scouting campaign. ISPCA appears to be more robust and predictive than MLR when sparse training sets are used that are representative of the data available during the early search for an optimal catalyst. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development.

KW - DFT

KW - catalyst prediction

KW - iterative supervised principal component analysis

KW - pyrrole

KW - selectivity

KW - titanium

UR - http://www.scopus.com/inward/record.url?scp=85096913928&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85096913928&partnerID=8YFLogxK

U2 - 10.1021/acscatal.0c03939

DO - 10.1021/acscatal.0c03939

M3 - Article

C2 - 34327040

AN - SCOPUS:85096913928

SN - 2155-5435

VL - 10

SP - 13504

EP - 13517

JO - ACS Catalysis

JF - ACS Catalysis

IS - 22

ER -

Iterative Supervised Principal Component Analysis Driven Ligand Design for Regioselective Ti-Catalyzed Pyrrole Synthesis

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this