The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models

EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning...

Full description

Saved in:

Bibliographic Details
Published in	Sensors (Basel, Switzerland) Vol. 21; no. 9; p. 3225
Main Authors	Kamrud, Alexander, Borghetti, Brett, Schubert Kabban, Christine
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 06.05.2021 MDPI
Subjects	Algorithms Best practice Classification covariate shift Datasets Deep learning EEG Electroencephalography Humans individual differences Individuality inter-subject variability Literature reviews Machine learning non-stationarity Probability distribution Visualization deep learning non-stationarity inter-participant EEG cross-participant covariate shift inter-subject variability individual differences
Online Access	Get full text
ISSN	1424-8220 1424-8220
DOI	10.3390/s21093225

Cover

More Information
Summary:	EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning data into training, validation, and testing sets, in order for cross-participant models to avoid overestimation of model accuracy. Despite this necessity, the majority of EEG-based cross-participant models have not adopted such guidelines. Furthermore, some data repositories may unwittingly contribute to the problem by providing partitioned test and non-test datasets for reasons such as competition support. In this study, we demonstrate how improper dataset partitioning and the resulting improper training, validation, and testing of a cross-participant model leads to overestimated model accuracy. We demonstrate this mathematically, and empirically, using five publicly available datasets. To build the cross-participant models for these datasets, we replicate published results and demonstrate how the model accuracies are significantly reduced when proper EEG cross-participant model guidelines are followed. Our empirical results show that by not following these guidelines, error rates of cross-participant models can be underestimated between 35% and 3900%. This misrepresentation of model performance for the general population potentially slows scientific progress toward truly high-performing classification models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s21093225