Least squares after model selection in high-dimensional sparse models

In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to im...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Belloni, Alexandre, Chernozhukov, Victor
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 20.03.2013
Subjects
Online AccessGet full text
ISSN2331-8422
DOI10.48550/arxiv.1001.0188

Cover

Abstract In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments.
AbstractList Bernoulli 2013, Vol. 19, No. 2, 521-547 In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments.
In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments.
Author Belloni, Alexandre
Chernozhukov, Victor
Author_xml – sequence: 1
  givenname: Alexandre
  surname: Belloni
  fullname: Belloni, Alexandre
– sequence: 2
  givenname: Victor
  surname: Chernozhukov
  fullname: Chernozhukov, Victor
BackLink https://doi.org/10.48550/arXiv.1001.0188$$DView paper in arXiv
https://doi.org/10.3150/11-BEJ410$$DView published paper (Access to full text may be restricted)
BookMark eNotkE1Lw0AQhhdRsNbePUnAc-J-Z3OUUqsQ8NJ7mGxmbUq-upuI_ntT42ngnYdhnveOXHd9h4Q8MJpIoxR9Bv9dfyWMUpZQZswVWXEhWGwk57dkE8KJUsp1ypUSK7LLEcIYhfMEHkMEbkQftX2FTRSwQTvWfRfVXXSsP49xVbfYhTmBeTuAD7ig4Z7cOGgCbv7nmhxed4ftW5x_7N-3L3kMiskYjJQOrbYlpqktrWOl0ZBJjRopzomrTMYz5aB0SlQpZbYEzlJrqHboQKzJ43L2T7EYfN2C_ykuqsVFdQaeFmDw_XnCMBanfvLzu6HgdC5HSSOk-AVYqFnd
ContentType Paper
Journal Article
Copyright 2013. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: 2013. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
AKZ
EPD
GOX
DOI 10.48550/arxiv.1001.0188
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection
ProQuest One Community College
ProQuest Central Korea
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
Proquest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
arXiv Mathematics
arXiv Statistics
arXiv.org
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
ExternalDocumentID 1001_0188
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
AKZ
EPD
GOX
ID FETCH-LOGICAL-a514-a844fec6cbe77cbcf1b86a946e6e0e7cbfd89295fabf53d701cba217c806fefa3
IEDL.DBID BENPR
IngestDate Tue Jul 22 22:01:33 EDT 2025
Mon Jun 30 09:24:15 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a514-a844fec6cbe77cbcf1b86a946e6e0e7cbfd89295fabf53d701cba217c806fefa3
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
IMS-BEJ-BEJ410
OpenAccessLink https://www.proquest.com/docview/2085554834?pq-origsite=%requestingapplication%&accountid=15518
PQID 2085554834
PQPubID 2050157
ParticipantIDs arxiv_primary_1001_0188
proquest_journals_2085554834
PublicationCentury 2000
PublicationDate 20130320
PublicationDateYYYYMMDD 2013-03-20
PublicationDate_xml – month: 03
  year: 2013
  text: 20130320
  day: 20
PublicationDecade 2010
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2013
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 1.50682
SecondaryResourceType preprint
Snippet In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators,...
Bernoulli 2013, Vol. 19, No. 2, 521-547 In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected...
SourceID arxiv
proquest
SourceType Open Access Repository
Aggregation Database
SubjectTerms Convergence
Economic models
Estimating techniques
Estimators
Goodness of fit
Least squares
Mathematics - Probability
Mathematics - Statistics Theory
Regression analysis
Regression models
Sparsity
Statistics - Methodology
Statistics - Theory
SummonAdditionalLinks – databaseName: arXiv.org
  dbid: GOX
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryI4le1ag5eFzabbDYeRVqKiF4q7G3JxwR6WbXbij_fSXbrRbyGSSAzmcwLYd4DuKMYO4E5ZqGkt6okAJJpZ1PiaVdUKAqf2D5f1PJNPtVlPYLbfS-M2Xyvv3p-YBs1RyKlJtd6DGPCCbGX97XuPxsTE9dg_mtGCDON_LlYU7VYHMHhAPPYQx-XYxhhewLz5yiUw7rPXWz6YUmemyUpGtYlORryEVu3LFIIZz7S7veUGYyyftNhb9qdwmoxXz0us0HGIDOERjKjpQzolLNYVc66wK1W5l4qVOQjGgleE0Ypg7GhFL7KubOGHgpO5ypgMOIMJu17ixfAqhCCMMjjV6K0nmuFVFu8K7inBb2YwnnafvPRM1VEQmLeRMdMYbZ3SDMc0q6J8pyEJrSQl_9OvIKDIglACMqnGUy2mx1eUxne2psUjB_AR4mC
  priority: 102
  providerName: Cornell University
Title Least squares after model selection in high-dimensional sparse models
URI https://www.proquest.com/docview/2085554834
https://arxiv.org/abs/1001.0188
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfZ1LS8NAEMeHtkHw5ttqLTl4jU2yyWY9iKCkLaK1SIXewmYf0EvaJq148rM7u2n0IHgJZDcEMpudmX39fwDX2MaCKF95OsaxaoQJiMdEbjseE2GiSCit2ueEjt-jp3k8b8GkOQtjtlU2PtE6arkUZo58YFiSGPoYie5Xa89Qo8zqaoPQ4Du0gryzEmNtcEKjjNUB5yGdTN9-Zl1CmmAOTer1SivmNeDl5-LDShHd-IEBsDi25I9vtgFneADOlK9UeQgtVRzBnt2nKapjSJ8Na8et1ltzbsi1hG_X0mzcyhJt0MzuonCNCrEnjXJ_rbrhouMoK1U_Wp3AbJjOHsfejoTgcUxoPM6iSCtBRa6SRORCBzmj_DaiiqKZsURLhmlOrHmuYyITPxA5x7GGYD7VSnNyCp1iWahzcBOtNeEqMKuRUS4DRhWGJynCQOILJenCmf38bFWLXRhN4yAzhulCrzFItvvPq-y3VS7-r76E_dCCJAj2yx50NuVWXWE43-R9aLPhqL9rKbwbvc7x-vKVfgP6UaRS
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTwIxEJ4gG6M336KoPehxhd2W3eVATFQICBJiMOG26faRcOG14OPH-d-cll09mHjj2t006Uw7j07n-wCuUceCqqpydQ1zVYYBiBuJxB68SPihor60aJ_9oP3Knka1UQG-8l4Y86wyt4nWUMupMHfkFcMlia4vouxuNncNa5SpruYUGjyjVpANCzGWNXZ01ec7pnBpo_OI-r7x_VZz-NB2M5YBl2Ow4PKIMa1EIBIVhiIR2kuigNdZoAJcAo5oGWEIUdM80TUqw6onEo5xvIiqgVaaU5x2CxxGWR1zP-e-2R-8_Fzy-EGIITtdl0ctdliFLz7Gbxb56LbqGb4Xx478cQXWv7X2wBnwmVrsQ0FNDmDbPgsV6SE0e4bah6TzlWlTIpZQnFjyHJJaAh3UKhlPiAE9dqUhCliDfBC0U4tUrX9Nj2C4CZEcQ3EynahTIKHWmnLlmeInS6QXBQq9oRS-J3FCSUtwYpcfz9bYGgZC2YuNYEpQzgUSZ8cqjX83wdn_n69gpz187sW9Tr97Dru-5bCgaBLKUFwuVuoCI4llcpnpi0C84R3yDQRG4HI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Least+squares+after+model+selection+in+high-dimensional+sparse+models&rft.jtitle=arXiv.org&rft.au=Belloni%2C+Alexandre&rft.au=Chernozhukov%2C+Victor&rft.date=2013-03-20&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.1001.0188