Precise feature selection using suffix array algorithm of bioinformatics
It is crucial to select the most relevant and informative features in a dataset to perform data analysis. Machine learning algorithms perform better when features are selected correctly. Feature selection is not solvable in polynomial time. The exact method takes exponential time, so the researchers...
Saved in:
| Published in | International journal of machine learning and cybernetics Vol. 16; no. 7-8; pp. 4265 - 4294 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.08.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1868-8071 1868-808X |
| DOI | 10.1007/s13042-024-02509-5 |
Cover
| Summary: | It is crucial to select the most relevant and informative features in a dataset to perform data analysis. Machine learning algorithms perform better when features are selected correctly. Feature selection is not solvable in polynomial time. The exact method takes exponential time, so the researchers used approximate algorithms to reach semi-optimal solutions. It is impossible to explore and exploit the search space in a balanced manner when using heuristic algorithms and metaheuristic methods. To solve this problem, the proposed method replaces meta-heuristic algorithms with the linear time SKEW algorithm in bioinformatics. First, each feature is ranked using the Pearson correlation criterion. Each feature is labeled
A
,
C
,
G
, or
T
according to its rank. The best feature is
A
, and the worst feature is
T
. The dataset can now be viewed as Deoxyribonucleic Acid (DNA). In the second step, the SKEW algorithm is used to determine the lexico-graphical order of suffixes. Suffixes are considered and checked as selected features. The third step involves permuting the features, and the first and second steps are repeated. The best suffix with the lowest cost function is selected after multiple iterations (e.g., ten). As compared to Simulated Annealing (SA), Genetic Algorithm (GA), Gray Wolf Optimizer (GWO), Grasshopper Optimization Algorithm (GOA), Ant Colony Optimization (ACO), Greedy, Gravitational Search Algorithm (GSA), and Pyramid Gravitational Search Algorithm (PGSA), the proposed algorithm improves the objective function by 19.3%, 7.6%, 80.6%, 102.2%, 39.7%, 105.6%, 38.1%, and 14.2% respectively. |
|---|---|
| ISSN: | 1868-8071 1868-808X |
| DOI: | 10.1007/s13042-024-02509-5 |