Least squares after model selection in high-dimensional sparse models

In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to im...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Belloni, Alexandre, Chernozhukov, Victor
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 20.03.2013
Subjects	Convergence Economic models Estimating techniques Estimators Goodness of fit Least squares Mathematics - Probability Mathematics - Statistics Theory Regression analysis Regression models Sparsity Statistics - Methodology Statistics - Theory
Online Access	Get full text
ISSN	2331-8422
DOI	10.48550/arxiv.1001.0188

Cover

Abstract	In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments.
AbstractList	Bernoulli 2013, Vol. 19, No. 2, 521-547 In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments. In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments.
Author	Belloni, Alexandre Chernozhukov, Victor
Author_xml	– sequence: 1 givenname: Alexandre surname: Belloni fullname: Belloni, Alexandre – sequence: 2 givenname: Victor surname: Chernozhukov fullname: Chernozhukov, Victor
BackLink	https://doi.org/10.48550/arXiv.1001.0188$$DView paper in arXiv https://doi.org/10.3150/11-BEJ410$$DView published paper (Access to full text may be restricted)
BookMark	eNotkE1Lw0AQhhdRsNbePUnAc-J-Z3OUUqsQ8NJ7mGxmbUq-upuI_ntT42ngnYdhnveOXHd9h4Q8MJpIoxR9Bv9dfyWMUpZQZswVWXEhWGwk57dkE8KJUsp1ypUSK7LLEcIYhfMEHkMEbkQftX2FTRSwQTvWfRfVXXSsP49xVbfYhTmBeTuAD7ig4Z7cOGgCbv7nmhxed4ftW5x_7N-3L3kMiskYjJQOrbYlpqktrWOl0ZBJjRopzomrTMYz5aB0SlQpZbYEzlJrqHboQKzJ43L2T7EYfN2C_ykuqsVFdQaeFmDw_XnCMBanfvLzu6HgdC5HSSOk-AVYqFnd
ContentType	Paper Journal Article
Copyright	2013. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: 2013. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKZ EPD GOX
DOI	10.48550/arxiv.1001.0188
DatabaseName	ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Proquest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Mathematics arXiv Statistics arXiv.org
DatabaseTitle	Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection
DatabaseTitleList	Publicly Available Content Database
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Physics
EISSN	2331-8422
ExternalDocumentID	1001_0188
Genre	Working Paper/Pre-Print
GroupedDBID	8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKZ EPD GOX
ID	FETCH-LOGICAL-a514-a844fec6cbe77cbcf1b86a946e6e0e7cbfd89295fabf53d701cba217c806fefa3
IEDL.DBID	BENPR
IngestDate	Tue Jul 22 22:01:33 EDT 2025 Mon Jun 30 09:24:15 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a514-a844fec6cbe77cbcf1b86a946e6e0e7cbfd89295fabf53d701cba217c806fefa3
Notes	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 IMS-BEJ-BEJ410
OpenAccessLink	https://www.proquest.com/docview/2085554834?pq-origsite=%requestingapplication%&accountid=15518
PQID	2085554834
PQPubID	2050157
ParticipantIDs	arxiv_primary_1001_0188 proquest_journals_2085554834
PublicationCentury	2000
PublicationDate	20130320
PublicationDateYYYYMMDD	2013-03-20
PublicationDate_xml	– month: 03 year: 2013 text: 20130320 day: 20
PublicationDecade	2010
PublicationPlace	Ithaca
PublicationPlace_xml	– name: Ithaca
PublicationTitle	arXiv.org
PublicationYear	2013
Publisher	Cornell University Library, arXiv.org
Publisher_xml	– name: Cornell University Library, arXiv.org
SSID	ssj0002672553
Score	1.50682
SecondaryResourceType	preprint
Snippet	In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators,... Bernoulli 2013, Vol. 19, No. 2, 521-547 In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected...
SourceID	arxiv proquest
SourceType	Open Access Repository Aggregation Database
SubjectTerms	Convergence Economic models Estimating techniques Estimators Goodness of fit Least squares Mathematics - Probability Mathematics - Statistics Theory Regression analysis Regression models Sparsity Statistics - Methodology Statistics - Theory
SummonAdditionalLinks	– databaseName: arXiv.org dbid: GOX link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryI4le1ag5eFzabbDYeRVqKiF4q7G3JxwR6WbXbij_fSXbrRbyGSSAzmcwLYd4DuKMYO4E5ZqGkt6okAJJpZ1PiaVdUKAqf2D5f1PJNPtVlPYLbfS-M2Xyvv3p-YBs1RyKlJtd6DGPCCbGX97XuPxsTE9dg_mtGCDON_LlYU7VYHMHhAPPYQx-XYxhhewLz5yiUw7rPXWz6YUmemyUpGtYlORryEVu3LFIIZz7S7veUGYyyftNhb9qdwmoxXz0us0HGIDOERjKjpQzolLNYVc66wK1W5l4qVOQjGgleE0Ypg7GhFL7KubOGHgpO5ypgMOIMJu17ixfAqhCCMMjjV6K0nmuFVFu8K7inBb2YwnnafvPRM1VEQmLeRMdMYbZ3SDMc0q6J8pyEJrSQl_9OvIKDIglACMqnGUy2mx1eUxne2psUjB_AR4mC priority: 102 providerName: Cornell University
Title	Least squares after model selection in high-dimensional sparse models
URI	https://www.proquest.com/docview/2085554834 https://arxiv.org/abs/1001.0188
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfZ1LS8NAEMeHtkHw5ttqLTl4jU2yyWY9iKCkLaK1SIXewmYf0EvaJq148rM7u2n0IHgJZDcEMpudmX39fwDX2MaCKF95OsaxaoQJiMdEbjseE2GiSCit2ueEjt-jp3k8b8GkOQtjtlU2PtE6arkUZo58YFiSGPoYie5Xa89Qo8zqaoPQ4Du0gryzEmNtcEKjjNUB5yGdTN9-Zl1CmmAOTer1SivmNeDl5-LDShHd-IEBsDi25I9vtgFneADOlK9UeQgtVRzBnt2nKapjSJ8Na8et1ltzbsi1hG_X0mzcyhJt0MzuonCNCrEnjXJ_rbrhouMoK1U_Wp3AbJjOHsfejoTgcUxoPM6iSCtBRa6SRORCBzmj_DaiiqKZsURLhmlOrHmuYyITPxA5x7GGYD7VSnNyCp1iWahzcBOtNeEqMKuRUS4DRhWGJynCQOILJenCmf38bFWLXRhN4yAzhulCrzFItvvPq-y3VS7-r76E_dCCJAj2yx50NuVWXWE43-R9aLPhqL9rKbwbvc7x-vKVfgP6UaRS
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTwIxEJ4gG6M336KoPehxhd2W3eVATFQICBJiMOG26faRcOG14OPH-d-cll09mHjj2t006Uw7j07n-wCuUceCqqpydQ1zVYYBiBuJxB68SPihor60aJ_9oP3Knka1UQG-8l4Y86wyt4nWUMupMHfkFcMlia4vouxuNncNa5SpruYUGjyjVpANCzGWNXZ01ec7pnBpo_OI-r7x_VZz-NB2M5YBl2Ow4PKIMa1EIBIVhiIR2kuigNdZoAJcAo5oGWEIUdM80TUqw6onEo5xvIiqgVaaU5x2CxxGWR1zP-e-2R-8_Fzy-EGIITtdl0ctdliFLz7Gbxb56LbqGb4Xx478cQXWv7X2wBnwmVrsQ0FNDmDbPgsV6SE0e4bah6TzlWlTIpZQnFjyHJJaAh3UKhlPiAE9dqUhCliDfBC0U4tUrX9Nj2C4CZEcQ3EynahTIKHWmnLlmeInS6QXBQq9oRS-J3FCSUtwYpcfz9bYGgZC2YuNYEpQzgUSZ8cqjX83wdn_n69gpz187sW9Tr97Dru-5bCgaBLKUFwuVuoCI4llcpnpi0C84R3yDQRG4HI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Least+squares+after+model+selection+in+high-dimensional+sparse+models&rft.jtitle=arXiv.org&rft.au=Belloni%2C+Alexandre&rft.au=Chernozhukov%2C+Victor&rft.date=2013-03-20&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.1001.0188