## Abstract

Given a dictionary of M_{n} predictors, in a random design regression setting with n observations, we construct estimators that target the best performance among all the linear combinations of the predictors under a sparse l_{q}-norm (0 ≤ q ≤ 1) constraint on the linear coefficients. Besides identifying the optimal rates of convergence, our universal aggregation strategies by model mixing achieve the optimal rates simultaneously over the full range of 0 ≤ q ≤ 1 for any M_{n} and without knowledge of the l_{q}-norm of the best linear coefficients to represent the regression function. To allow model misspecification, our upper bound results are obtained in a framework of aggregation of estimates. A striking feature is that no specific relationship among the predictors is needed to achieve the upper rates of convergence (hence permitting basically arbitrary correlations between the predictors). Therefore, whatever the true regression function (assumed to be uniformly bounded), our estimators automatically exploit any sparse representation of the regression function (if any), to the best extent possible within the -constrained linear combinations for any 0 ≤ q ≤ 1. A sparse approximation result in the l_{q}-hulls turns out to be crucial to adaptively achieve minimax rate optimal aggregation. It precisely characterizes the number of terms needed to achieve a prescribed accuracy of approximation to the best linear combination in an l_{q}-hull for 0 ≤ q ≤ 1. It offers the insight that the minimax rate of -aggregation is basically determined by an effective model size, which is a sparsity index that depends on q, M_{n}, n, and the l_{q}-norm bound in an easily interpretable way based on a classical model selection theory that deals with a large number of models.

Original language | English (US) |
---|---|

Pages (from-to) | 1675-1711 |

Number of pages | 37 |

Journal | Journal of Machine Learning Research |

Volume | 15 |

State | Published - May 2014 |

## Keywords

- High-dimensional sparse learning
- Minimax rate of convergence
- Model selection
- Optimal aggregation