Separability and scatteredness (S&S) ratio-based efficient SVM regularization parameter, kernel, and kernel parameter selection
Support Vector Machine (SVM) is a robust machine learning algorithm with broad applications in classification, regression, and outlier detection. SVM requires tuning a regularization parameter (RP) which controls the model capacity and the generalization performance. Conventionally, the optimum RP i...
Saved in:
| Published in | Pattern analysis and applications : PAA Vol. 28; no. 1 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
London
Springer London
01.03.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1433-7541 1433-755X |
| DOI | 10.1007/s10044-025-01411-2 |
Cover
| Summary: | Support Vector Machine (SVM) is a robust machine learning algorithm with broad applications in classification, regression, and outlier detection. SVM requires tuning a regularization parameter (RP) which controls the model capacity and the generalization performance. Conventionally, the optimum RP is found by comparison of a range of values through the Cross-Validation (CV) procedure. In addition, for non-linearly separable data, the SVM uses kernels. In this case a set of kernels, each with a set of parameters, denoted as a grid of kernels, are considered. The optimal choice of RP and the grid of kernels is through various forms of deterministic or probabilistic grid-search. The existing methods rely heavily on exhaustive searches and provide very limited insight into the underlying data characteristics, resulting in excessive computational complexity. This work addresses this issue by proposing a statistical framework that directly relates the dataset’s separability and scatteredness to the choice of optimal hyperparameters. By stochastically analyzing the behavior of the regularization parameter, the method shows that the SVM performance can be modeled as a function of the newly defined separability and scatteredness (S&S) ratio of the data. The Separability is a measure of the distance between classes, and the scatteredness is the ratio of the spread of data points. In particular, for the hinge loss cost function, an S&S ratio-based table provides the optimum RP. The data S&S ratio is a powerful value that can automatically evaluate linear or non-linear separability before using the SVM algorithm. The provided lookup S&S ratio-based table can also provide the optimum kernel and its parameters before using the SVM algorithm. Consequently, the computational complexity of the CV grid-search is reduced to only the computational complexity of one-time use of the SVM. The simulation results on the real dataset confirm the superiority of the proposed approach in the sense of efficiency and computational complexity over the grid-search methods. The method performs better or comparable to the existing state of-the-art methods with a significantly reduced computational cost. |
|---|---|
| ISSN: | 1433-7541 1433-755X |
| DOI: | 10.1007/s10044-025-01411-2 |