A unified adaptive tensor approximation scheme to accelerate composite convex optimization

BO JIANG; TIANYI LIN; SHUZHONG ZHANG

doi:10.1137/19M1286025

A unified adaptive tensor approximation scheme to accelerate composite convex optimization

BO JIANG, TIANYI LIN, SHUZHONG ZHANG

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

In this paper, we propose a unified two-phase scheme to accelerate any high-order regularized tensor approximation approach on the smooth part of a composite convex optimization model. The proposed scheme has the advantage of not needing to assume any prior knowledge of the Lipschitz constants for the gradient, the Hessian, and/or high-order derivatives. This is achieved by tuning the parameters used in the algorithm adaptively in its process of progression, which has been successfully incorporated in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. By adopting a similar approximate measure of the subproblem in [E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368] for nonconvex optimization, we establish the overall iteration complexity bounds for three specific algorithms to obtain an ϵ-optimal solution for composite convex problems. In general, we show that the adaptive high-order method has an iteration bound of Ol(1ϵ1/(p+1)r) if the first pth-order derivative information is used in the approximation, which has the same iteration complexity as in [M. Baes, Estimate Sequence Methods: Extensions and Approximations, Institute for Operations Research, ETH, Zürich, 2009; Y. Nesterov, Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], where the Lipschitz constants are assumed to be known, and the subproblems are assumed to be solved exactly. Thus, our results partially address the problem of incorporating adaptive strategies into the highorder accelerated methods raised by Nesterov in [Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], although our strategies cannot ensure the convexity of the auxiliary problem, and such adaptive strategies are already popular in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. Specifically, we show that the gradient method achieves an iteration complexity on the order of O(ϵ1/1/2r) , which is known to be best possible (cf. [Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer, 2018]), while the adaptive cubic regularization methods with the exact/inexact Hessian matrix achieve an iteration complexity on the order of O(1ϵ1/3r) , which matches that of the original accelerated cubic regularization method presented in [Y. Nesterov, Math. Program., 112 (2008), pp. 159-181]. The results of our numerical experiment clearly show the effect of the acceleration displayed in the adaptive Newton's method with cubic regularization on a set of regularized logistic regression instances.

Original language	English (US)
Pages (from-to)	2897-2926
Number of pages	30
Journal	SIAM Journal on Optimization
Volume	30
Issue number	4
DOIs	https://doi.org/10.1137/19M1286025
State	Published - Oct 8 2020
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2020 Society for Industrial and Applied Mathematics.

Keywords

Acceleration
Adaptive method
Convex optimization
Iteration complexity
Tensor method

Access

10.1137/19M1286025

OpenUrl availability

Full text

Cite this

@article{803e273ec63d47c8a34ce59f771a907b,

title = "A unified adaptive tensor approximation scheme to accelerate composite convex optimization",

abstract = "In this paper, we propose a unified two-phase scheme to accelerate any high-order regularized tensor approximation approach on the smooth part of a composite convex optimization model. The proposed scheme has the advantage of not needing to assume any prior knowledge of the Lipschitz constants for the gradient, the Hessian, and/or high-order derivatives. This is achieved by tuning the parameters used in the algorithm adaptively in its process of progression, which has been successfully incorporated in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. By adopting a similar approximate measure of the subproblem in [E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368] for nonconvex optimization, we establish the overall iteration complexity bounds for three specific algorithms to obtain an ϵ-optimal solution for composite convex problems. In general, we show that the adaptive high-order method has an iteration bound of Ol(1ϵ1/(p+1)r) if the first pth-order derivative information is used in the approximation, which has the same iteration complexity as in [M. Baes, Estimate Sequence Methods: Extensions and Approximations, Institute for Operations Research, ETH, Z{\"u}rich, 2009; Y. Nesterov, Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], where the Lipschitz constants are assumed to be known, and the subproblems are assumed to be solved exactly. Thus, our results partially address the problem of incorporating adaptive strategies into the highorder accelerated methods raised by Nesterov in [Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], although our strategies cannot ensure the convexity of the auxiliary problem, and such adaptive strategies are already popular in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. Specifically, we show that the gradient method achieves an iteration complexity on the order of O(ϵ1/1/2r) , which is known to be best possible (cf. [Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer, 2018]), while the adaptive cubic regularization methods with the exact/inexact Hessian matrix achieve an iteration complexity on the order of O(1ϵ1/3r) , which matches that of the original accelerated cubic regularization method presented in [Y. Nesterov, Math. Program., 112 (2008), pp. 159-181]. The results of our numerical experiment clearly show the effect of the acceleration displayed in the adaptive Newton's method with cubic regularization on a set of regularized logistic regression instances.",

keywords = "Acceleration, Adaptive method, Convex optimization, Iteration complexity, Tensor method",

author = "BO JIANG and TIANYI LIN and SHUZHONG ZHANG",

note = "Publisher Copyright: {\textcopyright} 2020 Society for Industrial and Applied Mathematics.",

year = "2020",

month = oct,

day = "8",

doi = "10.1137/19M1286025",

language = "English (US)",

volume = "30",

pages = "2897--2926",

journal = "SIAM Journal on Optimization",

issn = "1052-6234",

publisher = "Society for Industrial and Applied Mathematics Publications",

number = "4",

}

TY - JOUR

T1 - A unified adaptive tensor approximation scheme to accelerate composite convex optimization

AU - JIANG, BO

AU - LIN, TIANYI

AU - ZHANG, SHUZHONG

PY - 2020/10/8

Y1 - 2020/10/8

N2 - In this paper, we propose a unified two-phase scheme to accelerate any high-order regularized tensor approximation approach on the smooth part of a composite convex optimization model. The proposed scheme has the advantage of not needing to assume any prior knowledge of the Lipschitz constants for the gradient, the Hessian, and/or high-order derivatives. This is achieved by tuning the parameters used in the algorithm adaptively in its process of progression, which has been successfully incorporated in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. By adopting a similar approximate measure of the subproblem in [E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368] for nonconvex optimization, we establish the overall iteration complexity bounds for three specific algorithms to obtain an ϵ-optimal solution for composite convex problems. In general, we show that the adaptive high-order method has an iteration bound of Ol(1ϵ1/(p+1)r) if the first pth-order derivative information is used in the approximation, which has the same iteration complexity as in [M. Baes, Estimate Sequence Methods: Extensions and Approximations, Institute for Operations Research, ETH, Zürich, 2009; Y. Nesterov, Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], where the Lipschitz constants are assumed to be known, and the subproblems are assumed to be solved exactly. Thus, our results partially address the problem of incorporating adaptive strategies into the highorder accelerated methods raised by Nesterov in [Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], although our strategies cannot ensure the convexity of the auxiliary problem, and such adaptive strategies are already popular in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. Specifically, we show that the gradient method achieves an iteration complexity on the order of O(ϵ1/1/2r) , which is known to be best possible (cf. [Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer, 2018]), while the adaptive cubic regularization methods with the exact/inexact Hessian matrix achieve an iteration complexity on the order of O(1ϵ1/3r) , which matches that of the original accelerated cubic regularization method presented in [Y. Nesterov, Math. Program., 112 (2008), pp. 159-181]. The results of our numerical experiment clearly show the effect of the acceleration displayed in the adaptive Newton's method with cubic regularization on a set of regularized logistic regression instances.

AB - In this paper, we propose a unified two-phase scheme to accelerate any high-order regularized tensor approximation approach on the smooth part of a composite convex optimization model. The proposed scheme has the advantage of not needing to assume any prior knowledge of the Lipschitz constants for the gradient, the Hessian, and/or high-order derivatives. This is achieved by tuning the parameters used in the algorithm adaptively in its process of progression, which has been successfully incorporated in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. By adopting a similar approximate measure of the subproblem in [E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368] for nonconvex optimization, we establish the overall iteration complexity bounds for three specific algorithms to obtain an ϵ-optimal solution for composite convex problems. In general, we show that the adaptive high-order method has an iteration bound of Ol(1ϵ1/(p+1)r) if the first pth-order derivative information is used in the approximation, which has the same iteration complexity as in [M. Baes, Estimate Sequence Methods: Extensions and Approximations, Institute for Operations Research, ETH, Zürich, 2009; Y. Nesterov, Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], where the Lipschitz constants are assumed to be known, and the subproblems are assumed to be solved exactly. Thus, our results partially address the problem of incorporating adaptive strategies into the highorder accelerated methods raised by Nesterov in [Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], although our strategies cannot ensure the convexity of the auxiliary problem, and such adaptive strategies are already popular in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. Specifically, we show that the gradient method achieves an iteration complexity on the order of O(ϵ1/1/2r) , which is known to be best possible (cf. [Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer, 2018]), while the adaptive cubic regularization methods with the exact/inexact Hessian matrix achieve an iteration complexity on the order of O(1ϵ1/3r) , which matches that of the original accelerated cubic regularization method presented in [Y. Nesterov, Math. Program., 112 (2008), pp. 159-181]. The results of our numerical experiment clearly show the effect of the acceleration displayed in the adaptive Newton's method with cubic regularization on a set of regularized logistic regression instances.

KW - Acceleration

KW - Adaptive method

KW - Convex optimization

KW - Iteration complexity

KW - Tensor method

UR - http://www.scopus.com/inward/record.url?scp=85094971610&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85094971610&partnerID=8YFLogxK

U2 - 10.1137/19M1286025

DO - 10.1137/19M1286025

M3 - Article

AN - SCOPUS:85094971610

SN - 1052-6234

VL - 30

SP - 2897

EP - 2926

JO - SIAM Journal on Optimization

JF - SIAM Journal on Optimization

IS - 4

ER -

A unified adaptive tensor approximation scheme to accelerate composite convex optimization

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this