PWFS: A scalable parallel Python module for wrapper feature selection
In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requi...
Saved in:
| Published in | Journal of Innovative Engineering and Natural Science Vol. 5; no. 2; pp. 704 - 719 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
31.07.2025
|
| Online Access | Get full text |
| ISSN | 2791-7630 2791-7630 |
| DOI | 10.61112/jiens.1639780 |
Cover
| Summary: | In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/. |
|---|---|
| ISSN: | 2791-7630 2791-7630 |
| DOI: | 10.61112/jiens.1639780 |