Learning algorithms may perform worse with increasing training set size: Algorithm–data incompatibility

In machine learning problems a learning algorithm tries to learn the input–output dependency (relationship) of a system from a training dataset. This input–output relationship is usually deformed by a random noise. From experience, simulations, and special case theories, most practitioners believe t...

Full description

Saved in:

Bibliographic Details
Published in	Computational statistics & data analysis Vol. 74; pp. 181 - 197
Main Authors	Yousef, Waleed A., Kundu, Subrata
Format	Journal Article
Language	English
Published	Elsevier B.V 01.06.2014
Subjects	Algorithms artificial intelligence Computer simulation Convergence data collection Data processing Incompatibility Learning Machine learning Mathematical models Pattern recognition Stable distribution Statistical learning Statistics Stochastic concentration Training Pattern recognition Statistical learning Stochastic concentration Machine learning Stable distribution Convergence
Online Access	Get full text
ISSN	0167-9473 1872-7352
DOI	10.1016/j.csda.2013.05.021

Cover

More Information
Summary:	In machine learning problems a learning algorithm tries to learn the input–output dependency (relationship) of a system from a training dataset. This input–output relationship is usually deformed by a random noise. From experience, simulations, and special case theories, most practitioners believe that increasing the size of the training set improves the performance of the learning algorithm. It is shown that this phenomenon is not true in general for any pair of a learning algorithm and a data distribution. In particular, it is proven that for certain distributions and learning algorithms, increasing the training set size may result in a worse performance and increasing the training set size infinitely may result in the worst performance—even when there is no model misspecification for the input–output relationship. Simulation results and analysis of real datasets are provided to support the mathematical argument.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0167-9473 1872-7352
DOI:	10.1016/j.csda.2013.05.021