Abstract
In this paper, we propose a unified two-phase scheme to accelerate any high-order regularized tensor approximation approach on the smooth part of a composite convex optimization model. The proposed scheme has the advantage of not needing to assume any prior knowledge of the Lipschitz constants for the gradient, the Hessian, and/or high-order derivatives. This is achieved by tuning the parameters used in the algorithm adaptively in its process of progression, which has been successfully incorporated in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. By adopting a similar approximate measure of the subproblem in [E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368] for nonconvex optimization, we establish the overall iteration complexity bounds for three specific algorithms to obtain an ϵ-optimal solution for composite convex problems. In general, we show that the adaptive high-order method has an iteration bound of Ol(1ϵ1/(p+1)r) if the first pth-order derivative information is used in the approximation, which has the same iteration complexity as in [M. Baes, Estimate Sequence Methods: Extensions and Approximations, Institute for Operations Research, ETH, Zürich, 2009; Y. Nesterov, Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], where the Lipschitz constants are assumed to be known, and the subproblems are assumed to be solved exactly. Thus, our results partially address the problem of incorporating adaptive strategies into the highorder accelerated methods raised by Nesterov in [Math. Program., published online Nov. 21, 2019, https://doi.org/10.1007/s10107-019-01449-1], although our strategies cannot ensure the convexity of the auxiliary problem, and such adaptive strategies are already popular in high-order nonconvex optimization [C. Cartis, N. I. M. Gould, and Ph. L. Toint, Found. Comput. Math., 18 (2018), pp. 1073-1107; E. G. Birgin et al., Math. Program., 163 (2017), pp. 359-368]. Specifically, we show that the gradient method achieves an iteration complexity on the order of O(ϵ1/1/2r) , which is known to be best possible (cf. [Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer, 2018]), while the adaptive cubic regularization methods with the exact/inexact Hessian matrix achieve an iteration complexity on the order of O(1ϵ1/3r) , which matches that of the original accelerated cubic regularization method presented in [Y. Nesterov, Math. Program., 112 (2008), pp. 159-181]. The results of our numerical experiment clearly show the effect of the acceleration displayed in the adaptive Newton's method with cubic regularization on a set of regularized logistic regression instances.
Original language | English (US) |
---|---|
Pages (from-to) | 2897-2926 |
Number of pages | 30 |
Journal | SIAM Journal on Optimization |
Volume | 30 |
Issue number | 4 |
DOIs | |
State | Published - Oct 8 2020 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2020 Society for Industrial and Applied Mathematics.
Keywords
- Acceleration
- Adaptive method
- Convex optimization
- Iteration complexity
- Tensor method