Improving Naive Bayes for Regression with Optimized Artificial Surrogate Data

Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimization algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalization performance of naive Bayes for regres...

Full description

Saved in:
Bibliographic Details
Published inApplied artificial intelligence Vol. 34; no. 6; pp. 484 - 514
Main Authors Mayo, Michael, Frank, Eibe
Format Journal Article
LanguageEnglish
Published Philadelphia Taylor & Francis 11.05.2020
Taylor & Francis Ltd
Taylor & Francis Group
Subjects
Online AccessGet full text
ISSN0883-9514
1087-6545
1087-6545
DOI10.1080/08839514.2020.1726615

Cover

More Information
Summary:Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimization algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalization performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex "black box" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0883-9514
1087-6545
1087-6545
DOI:10.1080/08839514.2020.1726615